We produce quite a lot of data! Much of the data are business transactions stored in a relational database. More frequently, the data are non-structured, high volume and rapidly changing datasets known in the industry as Big Data. The challenge for data integration professionals is to combine and transform the data into useful information. Not just that, but it must also be done in near real-time and using a target system such as Hadoop. The topic of this session, real-time data streaming, provides a great solution for this challenging task. By integrating GoldenGate, Oracle’s premier data replication technology, and Apache Kafka, the latest open-source streaming and messaging system, we can implement a fast, durable, and scalable solution.
Presented at Oracle OpenWorld 2016
4. info@rittmanmead.com www.rittmanmead.com @rittmanmead
About Rittman Mead
4
•World’s leading specialist partner for technical
excellence, solutions delivery and innovation in
Oracle Data Integration, Business Intelligence,
Analytics and Big Data
•Providing our customers targeted expertise; we are a
company that doesn’t try to do everything… only
what we excel at
•70+ consultants worldwide including 1 Oracle ACE
Director and 2 Oracle ACEs, offering training
courses, global services, and consulting
•Founded on the values of collaboration, learning,
integrity and getting things done
Unlock the potential of your organization’s data
•Comprehensive service portfolio designed to
support the full lifecycle of any analytics solution
8. info@rittmanmead.com www.rittmanmead.com @rittmanmead
Example - Marketing
7
• Financial data stored in RDBMS
• Social media data, web logs, Google analytics, etc all in
various formats
• Bring it all together for analysis
‣ Marketing campaign effect on sales
14. info@rittmanmead.com www.rittmanmead.com @rittmanmead
Kafka - How is it used?
13
• Pure Event Streams
• System Metrics
• Derived Streams
• Hadoop Data Loads / Data Publishing
• Application Logs
• Database Changes
- Log Compaction
- Data cleansing
Image source: confluent.io
15. info@rittmanmead.com www.rittmanmead.com @rittmanmead
Kafka - How is it used?
13
• Pure Event Streams
• System Metrics
• Derived Streams
• Hadoop Data Loads / Data Publishing
• Application Logs
• Database Changes
- Log Compaction
- Data cleansing
Image source: confluent.io
X
O
18. info@rittmanmead.com www.rittmanmead.com @rittmanmead
A simple example…
15
One view of the Oracle Data Integrator logs
• ODI session logs - stored in the repository
database
• ODI Agent logs - text files
To see the full picture of your ODI environment, they must be
combined
19. info@rittmanmead.com www.rittmanmead.com @rittmanmead
Steps to extract from the database
16
• Prepare the database
• Setup GoldenGate for Oracle Database
- Install and configure
• Setup Manager, Extract and Pump parameter files
• Add Extract and Pump process groups
• Start the Extract and Pump processes
28. info@rittmanmead.com www.rittmanmead.com @rittmanmead
Stream ODI Agent Logs to Kafka via Logstash
25
• Application log processing is a standard use for Kafka
• Logstash
- Part of the Elastic (formerly ELK) stack
- Robin Moffatt’s post—> http://ritt.md/kafka-elk
- Producer configuration for Kafka
29. info@rittmanmead.com www.rittmanmead.com @rittmanmead
Logstash to Kafka - Setup and Startup
26
• Startup Zookeeper
- Elects controller broker
- Tracks brokers and topic config
- Manages access control and quotas
• Set Kafka server.properties
- Broker ID
- Number of partitions
- Log retention period
- Zookeeper connection
• Start Kafka
34. info@rittmanmead.com www.rittmanmead.com @rittmanmead
Oracle GoldenGate for Big Data
31
• Kafka - one of many handlers
- HDFS, HBase, Flume, Hive
• Pluggable Formatters
- Convert trail file transactions to alternate format
- Avro, delimited text, JSON, XML
• Metadata Provider
- Handles mapping of source to target columns that differ in structure/name
- Similar to SOURCEDEF file in GoldenGate
- Avro or Hive
35. info@rittmanmead.com www.rittmanmead.com @rittmanmead
Oracle GoldenGate for Big Data - Kafka Handler Setup
32
• Standard GoldenGate Extract / Pump processes
- Remember, no change here
• Replicat for Java parameter file & process group
• Kakfa Handler configuration
• Kafka Producer properties
- Note: Kafka 0.9.0+ now certified with GoldenGate for Big Data
12.2.1.1
36. info@rittmanmead.com www.rittmanmead.com @rittmanmead
Another approach…
33
• Kafka Connect Handler (Open Source)
- java.net/downloads/oracledi/GoldenGate
- Uses the Kafka Connect framework
- Can integrate with Confluent Platform & Schema Registry
- Tables = Topics
• Differences?
- OGG for Big Data Kafka Handler uses pluggable formatters
- Kafka Connect Handler builds up schemas and structs via the Kafka
Connect API
37. info@rittmanmead.com www.rittmanmead.com @rittmanmead
Oracle GoldenGate for Big Data - Prerequisites
34
• Zookeeper & Kafka up and running
• Add topic to broker up front vs dynamically
- Option to create a topic per table (OGG for Big Data 12.2.0.1.1)
• Kafka Handler must have access to broker server
• Kafka libraries must match Kafka version
43. info@rittmanmead.com www.rittmanmead.com @rittmanmead
GoldenGate and Kafka - One Topic Per Table
40
• gg.handler.kafkahandler.topicPartitioning = table
- Option to split schema into one topic per table
- Topics can be created dynamically
• gg.handler.kafkahandler.mode = op
- Operation mode required to track individual table operations
45. info@rittmanmead.com www.rittmanmead.com @rittmanmead
GoldenGate and Kafka - Startup
42
• Create a topic in Kakfa (or one per table)
• Add Replicat process group to GoldenGate on target
• Start Kafka console consumer
• Start GoldenGate extract/pump on source, replicat on target
50. info@rittmanmead.com www.rittmanmead.com @rittmanmead
GoldenGate Big Data Adapter - What to Think About
45
• GoldenGate might be a single point of failure
- Kafka is a fault-tolerant, distributed system
• Source transactions may end up larger than expected
- max.request.size
• Performance considerations
- batch.size and linger.ms
• higher values = increased latency, better throughput
- BlockingSend = false and Mode = tx
- GROUPTRANSOPS
• Monitoring
- Confluent? Custom?