Oracle Data Integrator and Oracle GoldenGate excel as standalone products, but paired together they are the perfect match for real-time data warehousing. Following Oracle’s Next Generation Reference Data Warehouse Architecture, this discussion will provide best practices on how to configure, implement, and process data in real-time using ODI and GoldenGate. Attendees will see common real-time challenges solved, including parent-child relationships within micro-batch ETL.
Presented at Rittman Mead BI Forum 2013 Masterclass.
GoldenGate and Oracle Data Integrator - A Perfect Match...
1. T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com
GoldenGate and Oracle Data Integrator - A Perfect Match...
Michael Rainey, Principal Consultant, Rittman Mead
Rittman Mead BI Forum 2013 Master Class, May 2013
2. T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com
•Real-time data warehousing is becoming standard across many organizations
•Oracle’s Reference Architecture for Information Management and Big Data
‣Staging, Foundation, and Access and Performance Layers
•Implementation of real-time data warehouse
‣Oracle GoldenGate - replication technology
‣Oracle Data Integrator - data integration & ETL
‣GoldenGate and ODI integration
•Real-time ETL
‣Standard approach using Journalized data
‣Solutions to common challenges
About this presentation...
3. T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com
Oracle Reference Architecture for Information Management and Big Data
4. T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com
Oracle Reference Architecture for Information Management and Big Data
5. T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com
Oracle Reference Architecture for Information Management and Big Data
6. T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com
Oracle Reference Architecture for Information Management and Big Data
7. T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com
Oracle GoldenGate
•Oracle’s standard tool for data replication
•Provides log-based capture, distribution, and delivery of committed transactions in real-time
‣Sub-second replication time
‣Minimal impact to source and target systems
‣Utilizes platform independent universal data format
•Replication of data between heterogeneous systems
‣Handles source and target column differences
•Uni-directional or bi-directional replication
•Easy to deploy - simple configuration of parameter files
8. T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com
Oracle Data Integrator 11g
•Oracle’s strategic product for data integration
•Supports batch, event-driven, and real-time integration
•Uses ELT (Extract, Load, Transform) approach
‣No middle ETL engine necessary
‣Utilizes power of target database to perform transformations
•Supports heterogeneous data sources
‣Oracle, SQL Server, XML, flat-file, MySQL, DB2...
•Declarative design - separation of business and technical integration
•Data integrity controls create a “data firewall”
•Extensible through “Knowledge Modules”
9. T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com
Oracle Data Integrator Journalizing (CDC)
•Change Data Capture (CDC)
‣Identify, capture, and deliver changes made to data in the source database
•Oracle Data Integrator CDC delivered through Journalizing
‣Journalizing Knowledge Module (JKM) performs setup and creates infrastructure
‣Simple vs Consistent Set
•ODI CDC Framework
‣Journals - tables (J$) hold references to the changed records and the change type (insert/update/delete)
‣Journalizing views - (JV$, JV$D) provides access to changed data, used by IKM’s and LKM’s
‣Capture processes - captures changed data from source datastores
-Database specific programs to retrieve log data from data server log files (Ex: Oracle GoldenGate)
‣Subscribers - entities that use the changed data tracked on a datastore or consistent set
-Data purged from journals after all subscribers have consumed changed data
10. T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com
•JKM Oracle to Oracle Consistent (OGG) Knowledge Module
‣Delivered with ODI
‣ODI Metadata used to generate GoldenGate
parameter files (extract, pump, replicat)
‣ReadMe.txt file created with instructions
•ODI CDC Framework generated
‣Staging table - replicate of the source
‣J$ (journal) table - change rows only
•Journalized data used in transformations
GoldenGate and ODI Integration
11. T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com
Staging Layer Load
•GoldenGate replication is setup and configured to keep
the Staging schema in sync with the Source
•All committed changes loaded to Staging in real-time
12. T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com
•Standard process was to load incremental changes from Staging to Foundation
‣Requires extra set of mappings
‣Increases latency of real-time data warehouse load
Foundation Layer Load
13. T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com
•GoldenGate will load Foundation directly
‣Reduced overall data warehouse load time
•A simple customization to the JKM will allow the
generation of source to Foundation GoldenGate
parameter files
‣Use INSERTALLRECORDS option for storing
transactional history
Foundation Layer Load
14. T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com
ODI and GoldenGate CDC Setup Process
•Setup Staging and Foundation Models
‣Add Datastores
‣Add Primary Keys
‣Add data warehouse audit columns
•Setup the GoldenGate JKM on each Model
‣Configure Options
‣Add each Datastore to CDC
‣Start Journal
•Follow ReadMe.txt instructions to complete GoldenGate setup
•Perform initial load of data from source to Staging and Foundation schemas
•Start GoldenGate replication
15. T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com
Set the Journalizing Knowledge Module Options - Staging
Datastore
Model
16. T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com
Journalizing Knowledge Module Options
•LOCAL_TEMP_DIR: Local path for generated parameter files
•SRC_LSCHEMA: Source Logical Schema
•SRC_DB_USER: Source GoldenGate user
•SRC_DB_PASSWORD: Source GoldenGate password
•SRC_OGG_PATH: Source GoldenGate install path
•SRC_SETUP_OGG_PROCESSES: Setup extract files if true
•STG_HOSTNAME: Target server hostname
•STG_MANAGER_PORT: Target GoldenGate install port
•STG_OGG_OBJECT_GROUP: Replicat file name
•STG_OGG_PATH: Target GoldenGate install path
•ENABLE_ODI_CDC: Setup the ODI CDC framework if true
•STG_OGG_TRACK_HISTORY: Custom option - store history
17. T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com
Configure GoldenGate
18. T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com
Configure GoldenGate
19. T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com
Set the Journalizing Knowledge Module Parameters - Foundation
20. T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com
Initial Load of Target and GoldenGate Startup
•Run initial load of source data before starting replication
‣Recommended tools: Oracle Datapump, Oracle Export/Import, DBLink
•Example initial load and GoldenGate startup process:
‣Follow instructions to setup the GoldenGate parameter files
‣Start the GoldenGate extract and pump processes
‣Run the initial load using Oracle Datapump as of SCN
‣Once the initial load has completed, start the GoldenGate
replicat process after the initial load SCN
-GGSCI >start replicat ODIT1T afterCSN 123456
‣Handling data collisions should not be necessary
21. T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com
Access and Performance Layer Load
•Moving change rows through to the star schema
‣Journalized data “out of the box”
‣Handling Parent-Child relationship
‣Subscription Views
22. T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com
Using ODI Journalized Data in an Interface
•Only one source Datastore can use “Journalized data” per Interface
‣Change view used as source
‣Filter added for Subscriber
•Extend window and lock subscribers prior to Interface execution
‣Ensures consistent dataset for the specific Subscriber
•Unlock subscribers and purge journal after execution
23. T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com
Journalized ODI Interface - Design
24. T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com
Journalized ODI Interface - Execution
25. T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com
Parent-Child Table Relationships
•Foreign key and parent-child relationships between Datastores
‣New or changed records in the Child table, Parent has no changes
‣Join between change datasets orphans child record
•Example:
‣Parent change view - JV$ON_FIELD_ROSTER
‣Child full table - GAME_PLAY_DETAILS
26. T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com
left outer join
Parent-child Tables Example
left outer join
Parent (change view) Child (full table)
Parent (change view) Child (full table)
27. T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com
Parent-child Tables Example
left outer join
left outer join
Parent (change view) Child (full table)
Parent (change view) Child (full table)
28. T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com
Parent-child Tables - Solution
•Create two ODI Interfaces
‣Interface 1: Parent table with Journalizing enabled joined to full child table
‣Interface 2: Child table with Journalizing enabled joined to full parent table
•Both Interfaces have same logic, column mappings, etc
•Consistent Journalizing must be used to ensure a consistent dataset
GoldenGate
Staging
ODI CDC ETL
Source Performance
29. T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com
Parent-child Tables - Design
30. T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com
Subscription Views
•Dynamic subscription views simplify development
‣Return a consistent set of data
‣Reduce the number of mappings
•Create a view for each Staging table
•ETL developer can choose dataset to be returned
‣Change rows (J$ table)
‣Current replicated rows (Staging table)
‣Full transactional history (Foundation table)
GoldenGate
Change Views
Staging
Subscription
Views
ETL
Source Performance
31. T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com
Subscription Views - SQL Code
32. T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com
Subscription Views - SQL Code
33. T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com
In summary...
•Real-time data warehousing is now a standard across many organizations
•Oracle’s Reference Architecture for Information Management and Big Data
provides a great structure for implementation
•GoldenGate and ODI as the delivery mechanism, while integrated, are the
perfect match for real-time data warehousing
•Real-time ETL can be achieved using ODI Change Data Capture
‣Parent-child relationships and subscription views
•More information can be found at http://www.rittmanmead.com
•Contact us at info@rittmanmead.com or michael.rainey@rittmanmead.com
•Follow-us on Twitter (@rittmanmead & @mRainey) or Facebook
(facebook.com/rittmanmead)
34. T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com
GoldenGate and ODI - A Perfect Match...
Michael Rainey, Principal Consultant, Rittman Mead
Rittman Mead BI Forum 2013 Master Class, May 2013