SlideShare uma empresa Scribd logo
1 de 54
Lessons learned building a connector
using Kafka Connect
Kate Stanley and Andrew Schofield
Kafka Summit SF 2019
IBM Event StreamsApache Kafka© 2019 IBM Corporation
“Kafka Connect is a tool for scalably and
reliably streaming data between Apache
Kafka and other systems”
© 2019 IBM Corporation
IBM MQ
© 2019 IBM Corporation
MESSAGE QUEUING EVENT STREAMING
Assured delivery Stream history
© 2019 IBM Corporation
IBM MQ
MQ
CLIENT
APP
MQ
CLIENT
APP
MQ
CLIENT
APP
MQ
CLIENT
APP
© 2019 IBM Corporation
KAFKA
CLIENT
APP
KAFKA
CLIENT
APP
KAFKA CONNECT
IBM MQ
MQ
CLIENT
APP
MQ
CLIENT
APP
MQ
CLIENT
APP
MQ
CLIENT
APP
© 2019 IBM Corporation
© 2019 IBM Corporation
Getting started with Kafka
Connect
© 2019 IBM Corporation
Kafka Connect – an API for
implementing connectors
© 2019 IBM Corporation
Kafka Connect – a runtime for executing
connector processes
© 2019 IBM Corporation
Connector
© 2019 IBM Corporation
Connector
kafka-connect-mq-source.jar
© 2019 IBM Corporation
Connector
Connect worker
CLASSPATH
kafka-connect-mq-source.jar
kafka-connect-mq-source.jar
© 2019 IBM Corporation
Connector
Connect worker
CLASSPATH
kafka-connect-mq-source.jar
kafka-connect-mq-source.jar
Connector X
Task X
Task X
© 2019 IBM Corporation
Connector
Connect worker
CLASSPATH
kafka-connect-mq-source.jar
Connector X
Task X
Task X
Connector plugin
Connector
© 2019 IBM Corporation
Getting started with Kafka Connect
—— kafka
|—— libs
| |—— connect-api-2.3.0.jar
| |—— connect-basic-auth-extension-2.3.0.jar
| |—— connect-file-2.3.0.jar
| |—— connect-json-2.3.0.jar
| |—— connect-runtime-2.3.0.jar
| `—— connect-transforms-2.3.0.jar
`—— bin
|—— connect-distributed.sh
`—— connect-standalone.sh
© 2019 IBM Corporation
Getting started with Kafka Connect
FROM ibmcom/eventstreams-kafka
COPY connect-distributed.properties /opt/kafka/config/
COPY connectors /opt/connectors/
WORKDIR /opt/kafka
EXPOSE 8083
ENTRYPOINT ["./bin/connect-distributed.sh",
"config/connect-distributed.properties"]
© 2019 IBM Corporation
$ connect-distributed.sh
connect-distributed.properties
$ connect-standalone.sh
connect-standalone.properties
connector1.properties
© 2019 IBM Corporation
Getting started with Kafka Connect
$ curl http://localhost:8083/connector-plugins
[
{
"class":"org.apache.kafka.connect.file.FileStreamSinkConnector",
"type":"sink",
"version":"2.3.0”
},
{
"class":"org.apache.kafka.connect.file.FileStreamSourceConnector",
"type":"source",
"version":"2.3.0”
}
]
© 2019 IBM Corporation
Getting started with Kafka Connect
$ echo ‘{
"name":"kate-file-load",
"config":{"connector.class":"FileStreamSource",
"file":"config/server.properties",
"topic":"kafka-config-topic"}}’ |
curl -X POST -d @- http://localhost:8083/connectors
--header "content-Type:application/json"
$ curl http://localhost:8083/connectors
["kate-file-load"]
© 2019 IBM Corporation
Connector internals
© 2019 IBM Corporation
Anatomy of a connector
Task X
Connector X Task X
Task X
© 2019 IBM Corporation
Anatomy of a connector
© 2019 IBM Corporation
Key considerations – partitions, tasks and topics
© 2019 IBM Corporation
Key considerations – partitions, tasks and topics
file.txt
1. Start
2. The beginning
3. The middle
4. Conclusion
5. Ending
6. Finish
© 2019 IBM Corporation
Key considerations – partitions, tasks and topics
Partition 1
file.txt
1. Start
2. The beginning
3. The middle
4. Conclusion
5. Ending
6. Finish
1. Start
3. The middle
5. Ending
2. The beginning
4. Conclusion
6. Finish
Partition 2SOURCE
CONNECTOR
Topic
© 2019 IBM Corporation
Key considerations – partitions, tasks and topics
file-copy.txt
Partition 1
file.txt
1. Start
2. The beginning
3. The middle
4. Conclusion
5. Ending
6. Finish
1. Start
3. The middle
5. Ending
2. The beginning
4. Conclusion
6. Finish
Partition 2SOURCE
CONNECTOR
SINK
CONNECTOR
1. Start
3. The middle
5. Ending
2. The beginning
4. Conclusion
6. Finish
Topic
© 2019 IBM Corporation
Key considerations – Data formats
© 2019 IBM Corporation
Key considerations – Data formats
EXTERNAL SYSTEM
FORMAT
KAFKA RECORD
FORMAT
KAFKA CONNECT INTERNAL
FORMAT
© 2019 IBM Corporation
EXTERNAL SYSTEM
FORMAT
KAFKA RECORD
FORMAT
KAFKA CONNECT INTERNAL
FORMAT
Connector builders
e.g. com.ibm.eventstreams.connect.mqsource.builders.JsonRecordBuilder
e.g. com.ibm.eventstreams.connect.mqsink.builders.JsonMessageBuilder
Key considerations – Data formats
© 2019 IBM Corporation
org.apache.kafka.connect.converters.ByteArrayConverter
org.apache.kafka.connect.storage.StringConverter
org.apache.kafka.connect.json.JsonConverter
EXTERNAL SYSTEM
FORMAT
KAFKA RECORD
FORMAT
KAFKA CONNECT INTERNAL
FORMAT
Key considerations – Data formats
© 2019 IBM Corporation
Implementing the API
© 2019 IBM Corporation
version()
config()
validate(config)
start(config)
Connector initialize
parse and
validate config
Lifecycle of a connector
© 2019 IBM Corporation
Connector config
@Override
public ConfigDef config() {
ConfigDef configDef = new ConfigDef();
configDef.define(”config_option", Type.STRING, Importance.HIGH, ”Config option.");
return configDef;
}
$ curl -X PUT -d '{"connector.class":”MyConnector"}’
http://localhost:8083/connector-plugins/MyConnector/config/validate
{“configs”: [{
“definition”: {“name”: “config_option”, “importance”: “HIGH”, “default_value”: null, …},
”value”: {
“errors”: [“Missing required configuration ”config_option” which has no default value.”],
…
}
© 2019 IBM Corporation
version()
config()
validate(config)
start(config)
taskClass()
taskConfigs(max)
initialize
parse and
validate config create tasks
Lifecycle of a connector
stop()
Connector
© 2019 IBM Corporation
Source Task initialize running
stop()
poll()
commit()
commitRecord(record)
version()
start(config)
initialize
parse and
validate config create tasks
Lifecycle of a connector
Connector
© 2019 IBM Corporation
Lifecycle of a connector
initialize running
stop()
put(records)
flush(offsets)
version()
start(config)
Sink Task
initialize
parse and
validate config create tasksConnector
© 2019 IBM Corporation
Kafka Connect and IBM MQ
© 2019 IBM Corporation
It’s easy to connect IBM MQ to Apache Kafka
IBM has created a pair of connectors, available as source code or as part of IBM Event
Streams
Source connector
From MQ queue to Kafka topic
https://github.com/ibm-messaging/kafka-connect-mq-source
Sink connector
From Kafka topic to MQ queue
https://github.com/ibm-messaging/kafka-connect-mq-sink
Fully supported by IBM for customers with support entitlement for IBM Event Streams
© 2019 IBM Corporation
Where can I get these magnificent connectors?
https://ibm.github.io/event-streams/connectors/
© 2019 IBM Corporation
Connecting IBM MQ to Apache Kafka
The connectors are deployed into a Kafka Connect runtime
This runs between IBM MQ and Apache Kafka
CLIENT
IBM MQ
TO.KAFKA.Q
FROM.KAFKA.Q
Kafka Connect worker
FROM.MQ.TOPIC
Kafka Connect worker
MQ SINK
CONNECTOR
TO.MQ.TOPIC
MQ SOURCE
CONNECTOR
CLIENT
© 2019 IBM Corporation
Running Kafka Connect on a mainframe
IBM MQ Advanced for z/OS VUE provides support for the Kafka Connect workers to
be deployed onto z/OS Unix System Services using bindings connections to MQ
BINDINGS
IBM MQ Advanced
for z/OS VUE
TO.KAFKA.Q
FROM.KAFKA.Q
Kafka Connect worker
FROM.MQ.TOPIC
Kafka Connect worker
MQ SINK
CONNECTOR
TO.MQ.TOPIC
MQ SOURCE
CONNECTOR
BINDINGS
Unix System Services
© 2019 IBM Corporation
Design of the MQ sink
connector
© 2019 IBM Corporation
MQ sink connector
Converter MessageBuilder
TO.MQ.TOPIC SinkRecord
Value
(may be complex)
Schema
Kafka Record
Value
byte[]
Key
MQ Message
Payload
MQMD
(MQRFH2)
MQ SINK
CONNECTOR
FROM.KAFKA.Q
© 2019 IBM Corporation
Sink task – Design
Sink connector is relatively simple
The interface is synchronous and fits MQ quite well
Balancing efficiency with resource limits is the key
put(Collection<SinkRecord> records)
Converts Kafka records to MQ messages and sends in a transaction
Always requests a flush to avoid hitting MQ transaction limits
flush(Map<TopicPartition, OffsetAndMetadata> currentOffsets)
Commits any pending sends
This batches messages into MQ without excessively large batches
© 2019 IBM Corporation
Design of the MQ source
connector
© 2019 IBM Corporation
MQ source connector
RecordBuilder Converter
TO.MQ.TOPICSource Record
Value
(may be complex)
Schema
MQ Message Kafka Record
Null Record
MQ SOURCE
CONNECTOR
TO.KAFKA.Q
Value
byte[]
Payload
MQMD
(MQRFH2)
© 2019 IBM Corporation
Source task – Original design
Source connector is more complicated
It’s multi-threaded and asynchronous which fits MQ less naturally
List<SourceRecord> poll()
Waits for up to 30 seconds for MQ messages and returned as a batch
Multiple calls to poll() could contribute to an MQ transaction
commit()
Asynchronously commits the active MQ transaction
Works quite well but commit() is too infrequent under load which causes throttling
commit() does ensure that the most recent batch of messages polled have been acked by Kafka, but it
doesn’t quite feel like the right way to do it
© 2019 IBM Corporation
Source task – Revised design
Changed so each call to poll() comprises a single MQ transaction
commit() is no longer used in normal operation
List<SourceRecord> poll()
Waits for records from the previous poll() to be acked by Kafka
Commits the active MQ transaction – the previous batch
Waits for up to 30 seconds for MQ messages and returned as a new batch
commitRecord(SourceRecord record)
Just counts up the acks for the records sent
MQ transactions are much shorter-lived
No longer throttles under load
Feels a much better fit for the design of Kafka Connect
© 2019 IBM Corporation
Stopping a source task is tricky
stop() is called on SourceTask to indicate the task should stop
Running asynchronously wrt to the polling and commit threads
Can’t be sure whether poll() or commit() are currently active or will be called very soon
Since poll() and commit() may both want access to the MQ connection
It’s not clear when it’s safe to close it
KIP-419: Safely notify Kafka Connect SourceTask is stopped
Adds a stopped() method to SourceTask that is guaranteed to be the final call to the task
uninitialized
initialize()
initialized running stopping
start() stop() stopped()
poll()
commit()
commitRecord()
poll()
commit()
commitRecord()© 2019 IBM Corporation
Summary
Over 80 connectors
IBM MQ
HDFS
Elasticsearch
MySQL
JDBC
MQTT
CoAP
+ many others
© 2019 IBM Corporation
Summary
Connector initialize
parse and
validate config
create tasks
Sink Task initialize running
Source Task initialize running
© 2019 IBM Corporation
Summary
© 2019 IBM Corporation
Thank you
Kate Stanley
Andrew Schofield
Kafka Connect:
IBM Event Streams:
© 2019 IBM Corporation
@katestanley91
https://medium.com/@andrew_schofield
https://kafka.apache.org/documentation/#connect
https://github.com/ibm-messaging/kafka-connect-mq-source
https://github.com/ibm-messaging/kafka-connect-mq-sink
https://ibm.github.io/event-streams/connectors/
ibm.com/cloud/event-streams

Mais conteúdo relacionado

Mais procurados

A Solution for Leveraging Kafka to Provide End-to-End ACID Transactions
A Solution for Leveraging Kafka to Provide End-to-End ACID TransactionsA Solution for Leveraging Kafka to Provide End-to-End ACID Transactions
A Solution for Leveraging Kafka to Provide End-to-End ACID Transactions
confluent
 
Mainframe Integration, Offloading and Replacement with Apache Kafka
Mainframe Integration, Offloading and Replacement with Apache KafkaMainframe Integration, Offloading and Replacement with Apache Kafka
Mainframe Integration, Offloading and Replacement with Apache Kafka
Kai Wähner
 

Mais procurados (20)

More the merrier: a microservices anti-pattern
More the merrier: a microservices anti-patternMore the merrier: a microservices anti-pattern
More the merrier: a microservices anti-pattern
 
A Solution for Leveraging Kafka to Provide End-to-End ACID Transactions
A Solution for Leveraging Kafka to Provide End-to-End ACID TransactionsA Solution for Leveraging Kafka to Provide End-to-End ACID Transactions
A Solution for Leveraging Kafka to Provide End-to-End ACID Transactions
 
Microservices, Containers, Kubernetes, Kafka, Kanban
Microservices, Containers, Kubernetes, Kafka, KanbanMicroservices, Containers, Kubernetes, Kafka, Kanban
Microservices, Containers, Kubernetes, Kafka, Kanban
 
API Management - Why it matters!
API Management - Why it matters!API Management - Why it matters!
API Management - Why it matters!
 
IBM MQ cloud architecture blueprint
IBM MQ cloud architecture blueprintIBM MQ cloud architecture blueprint
IBM MQ cloud architecture blueprint
 
IBM MQ in Containers - Think 2018
IBM MQ in Containers - Think 2018IBM MQ in Containers - Think 2018
IBM MQ in Containers - Think 2018
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
 
AWS solution Architect Associate study material
AWS solution Architect Associate study materialAWS solution Architect Associate study material
AWS solution Architect Associate study material
 
Introduction to Microservices
Introduction to MicroservicesIntroduction to Microservices
Introduction to Microservices
 
AWS Networking Fundamentals - SVC304 - Anaheim AWS Summit
AWS Networking Fundamentals - SVC304 - Anaheim AWS SummitAWS Networking Fundamentals - SVC304 - Anaheim AWS Summit
AWS Networking Fundamentals - SVC304 - Anaheim AWS Summit
 
Designing loosely coupled services
Designing loosely coupled servicesDesigning loosely coupled services
Designing loosely coupled services
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
 
Gravitee.io
Gravitee.ioGravitee.io
Gravitee.io
 
Mainframe Integration, Offloading and Replacement with Apache Kafka
Mainframe Integration, Offloading and Replacement with Apache KafkaMainframe Integration, Offloading and Replacement with Apache Kafka
Mainframe Integration, Offloading and Replacement with Apache Kafka
 
AWS CDK introduction
AWS CDK introductionAWS CDK introduction
AWS CDK introduction
 
IBM DataPower Gateway - Common Use Cases
IBM DataPower Gateway - Common Use CasesIBM DataPower Gateway - Common Use Cases
IBM DataPower Gateway - Common Use Cases
 
Apache Kafka vs. Traditional Middleware (Kai Waehner, Confluent) Frankfurt 20...
Apache Kafka vs. Traditional Middleware (Kai Waehner, Confluent) Frankfurt 20...Apache Kafka vs. Traditional Middleware (Kai Waehner, Confluent) Frankfurt 20...
Apache Kafka vs. Traditional Middleware (Kai Waehner, Confluent) Frankfurt 20...
 
Resiliency-and-Availability-Design-Patterns-for-the-Cloud
Resiliency-and-Availability-Design-Patterns-for-the-CloudResiliency-and-Availability-Design-Patterns-for-the-Cloud
Resiliency-and-Availability-Design-Patterns-for-the-Cloud
 
Getting Started on Amazon EKS
Getting Started on Amazon EKSGetting Started on Amazon EKS
Getting Started on Amazon EKS
 
Amazon Sagemaker Studio를 통한 ML개발하기 - 소성운(크로키닷컴) :: AWS Community D...
Amazon Sagemaker Studio를 통한 ML개발하기 - 소성운(크로키닷컴) :: AWS Community D...Amazon Sagemaker Studio를 통한 ML개발하기 - 소성운(크로키닷컴) :: AWS Community D...
Amazon Sagemaker Studio를 통한 ML개발하기 - 소성운(크로키닷컴) :: AWS Community D...
 

Semelhante a Lessons Learned Building a Connector Using Kafka Connect (Katherine Stanley & Andrew Schofield, IBM UK) Kafka Summit SF 2019

Kafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka
Kafka Connect & Kafka Streams/KSQL - the ecosystem around KafkaKafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka
Kafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka
Guido Schmutz
 

Semelhante a Lessons Learned Building a Connector Using Kafka Connect (Katherine Stanley & Andrew Schofield, IBM UK) Kafka Summit SF 2019 (20)

Lessons Learned Building a Connector Using Kafka Connect (Katherine Stanley &...
Lessons Learned Building a Connector Using Kafka Connect (Katherine Stanley &...Lessons Learned Building a Connector Using Kafka Connect (Katherine Stanley &...
Lessons Learned Building a Connector Using Kafka Connect (Katherine Stanley &...
 
Kafka with IBM Event Streams - Technical Presentation
Kafka with IBM Event Streams - Technical PresentationKafka with IBM Event Streams - Technical Presentation
Kafka with IBM Event Streams - Technical Presentation
 
Technology choices for Apache Kafka and Change Data Capture
Technology choices for Apache Kafka and Change Data CaptureTechnology choices for Apache Kafka and Change Data Capture
Technology choices for Apache Kafka and Change Data Capture
 
Connecting mq&amp;kafka
Connecting mq&amp;kafkaConnecting mq&amp;kafka
Connecting mq&amp;kafka
 
Fast Kafka Apps! (Edoardo Comar and Mickael Maison, IBM) Kafka Summit London ...
Fast Kafka Apps! (Edoardo Comar and Mickael Maison, IBM) Kafka Summit London ...Fast Kafka Apps! (Edoardo Comar and Mickael Maison, IBM) Kafka Summit London ...
Fast Kafka Apps! (Edoardo Comar and Mickael Maison, IBM) Kafka Summit London ...
 
Virtual Meetup Sweden - Reacting to an event driven world
Virtual Meetup Sweden - Reacting to an event driven worldVirtual Meetup Sweden - Reacting to an event driven world
Virtual Meetup Sweden - Reacting to an event driven world
 
What's new in MQ 9.1.* on z/OS
What's new in MQ 9.1.* on z/OSWhat's new in MQ 9.1.* on z/OS
What's new in MQ 9.1.* on z/OS
 
JSpring Virtual 2020 - Reacting to an event-driven world
JSpring Virtual 2020 - Reacting to an event-driven worldJSpring Virtual 2020 - Reacting to an event-driven world
JSpring Virtual 2020 - Reacting to an event-driven world
 
Jfokus - Reacting to an event-driven world
Jfokus - Reacting to an event-driven worldJfokus - Reacting to an event-driven world
Jfokus - Reacting to an event-driven world
 
Running Kafka in Kubernetes: A Practical Guide (Katherine Stanley, IBM United...
Running Kafka in Kubernetes: A Practical Guide (Katherine Stanley, IBM United...Running Kafka in Kubernetes: A Practical Guide (Katherine Stanley, IBM United...
Running Kafka in Kubernetes: A Practical Guide (Katherine Stanley, IBM United...
 
DevNexus - Reacting to an event driven world
DevNexus - Reacting to an event driven worldDevNexus - Reacting to an event driven world
DevNexus - Reacting to an event driven world
 
IBM MQ Update, including 9.1.2 CD
IBM MQ Update, including 9.1.2 CDIBM MQ Update, including 9.1.2 CD
IBM MQ Update, including 9.1.2 CD
 
JLove conference 2020 - Reacting to an Event-Driven World
JLove conference 2020 - Reacting to an Event-Driven WorldJLove conference 2020 - Reacting to an Event-Driven World
JLove conference 2020 - Reacting to an Event-Driven World
 
Containerize Legacy .NET Framework Web Apps for Cloud Migration
Containerize Legacy .NET Framework Web Apps for Cloud MigrationContainerize Legacy .NET Framework Web Apps for Cloud Migration
Containerize Legacy .NET Framework Web Apps for Cloud Migration
 
Firecracker: Secure and fast microVMs for serverless computing - SEP316 - AWS...
Firecracker: Secure and fast microVMs for serverless computing - SEP316 - AWS...Firecracker: Secure and fast microVMs for serverless computing - SEP316 - AWS...
Firecracker: Secure and fast microVMs for serverless computing - SEP316 - AWS...
 
GIDS Architecture Live: Reacting to an event-driven world
GIDS Architecture Live: Reacting to an event-driven worldGIDS Architecture Live: Reacting to an event-driven world
GIDS Architecture Live: Reacting to an event-driven world
 
Kafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka
Kafka Connect & Kafka Streams/KSQL - the ecosystem around KafkaKafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka
Kafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka
 
Ditching the overhead - Moving Apache Kafka workloads into Amazon MSK - ADB30...
Ditching the overhead - Moving Apache Kafka workloads into Amazon MSK - ADB30...Ditching the overhead - Moving Apache Kafka workloads into Amazon MSK - ADB30...
Ditching the overhead - Moving Apache Kafka workloads into Amazon MSK - ADB30...
 
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...
 
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Ver...
Towards Flink 2.0:  Unified Batch & Stream Processing - Aljoscha Krettek, Ver...Towards Flink 2.0:  Unified Batch & Stream Processing - Aljoscha Krettek, Ver...
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Ver...
 

Mais de confluent

Mais de confluent (20)

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streams
 

Último

introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 

Último (20)

W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 

Lessons Learned Building a Connector Using Kafka Connect (Katherine Stanley & Andrew Schofield, IBM UK) Kafka Summit SF 2019

  • 1. Lessons learned building a connector using Kafka Connect Kate Stanley and Andrew Schofield Kafka Summit SF 2019 IBM Event StreamsApache Kafka© 2019 IBM Corporation
  • 2. “Kafka Connect is a tool for scalably and reliably streaming data between Apache Kafka and other systems” © 2019 IBM Corporation
  • 3. IBM MQ © 2019 IBM Corporation
  • 4. MESSAGE QUEUING EVENT STREAMING Assured delivery Stream history © 2019 IBM Corporation
  • 7. © 2019 IBM Corporation
  • 8. Getting started with Kafka Connect © 2019 IBM Corporation
  • 9. Kafka Connect – an API for implementing connectors © 2019 IBM Corporation
  • 10. Kafka Connect – a runtime for executing connector processes © 2019 IBM Corporation
  • 11. Connector © 2019 IBM Corporation
  • 15. Connector Connect worker CLASSPATH kafka-connect-mq-source.jar Connector X Task X Task X Connector plugin Connector © 2019 IBM Corporation
  • 16. Getting started with Kafka Connect —— kafka |—— libs | |—— connect-api-2.3.0.jar | |—— connect-basic-auth-extension-2.3.0.jar | |—— connect-file-2.3.0.jar | |—— connect-json-2.3.0.jar | |—— connect-runtime-2.3.0.jar | `—— connect-transforms-2.3.0.jar `—— bin |—— connect-distributed.sh `—— connect-standalone.sh © 2019 IBM Corporation
  • 17. Getting started with Kafka Connect FROM ibmcom/eventstreams-kafka COPY connect-distributed.properties /opt/kafka/config/ COPY connectors /opt/connectors/ WORKDIR /opt/kafka EXPOSE 8083 ENTRYPOINT ["./bin/connect-distributed.sh", "config/connect-distributed.properties"] © 2019 IBM Corporation
  • 19. Getting started with Kafka Connect $ curl http://localhost:8083/connector-plugins [ { "class":"org.apache.kafka.connect.file.FileStreamSinkConnector", "type":"sink", "version":"2.3.0” }, { "class":"org.apache.kafka.connect.file.FileStreamSourceConnector", "type":"source", "version":"2.3.0” } ] © 2019 IBM Corporation
  • 20. Getting started with Kafka Connect $ echo ‘{ "name":"kate-file-load", "config":{"connector.class":"FileStreamSource", "file":"config/server.properties", "topic":"kafka-config-topic"}}’ | curl -X POST -d @- http://localhost:8083/connectors --header "content-Type:application/json" $ curl http://localhost:8083/connectors ["kate-file-load"] © 2019 IBM Corporation
  • 21. Connector internals © 2019 IBM Corporation
  • 22. Anatomy of a connector Task X Connector X Task X Task X © 2019 IBM Corporation
  • 23. Anatomy of a connector © 2019 IBM Corporation
  • 24. Key considerations – partitions, tasks and topics © 2019 IBM Corporation
  • 25. Key considerations – partitions, tasks and topics file.txt 1. Start 2. The beginning 3. The middle 4. Conclusion 5. Ending 6. Finish © 2019 IBM Corporation
  • 26. Key considerations – partitions, tasks and topics Partition 1 file.txt 1. Start 2. The beginning 3. The middle 4. Conclusion 5. Ending 6. Finish 1. Start 3. The middle 5. Ending 2. The beginning 4. Conclusion 6. Finish Partition 2SOURCE CONNECTOR Topic © 2019 IBM Corporation
  • 27. Key considerations – partitions, tasks and topics file-copy.txt Partition 1 file.txt 1. Start 2. The beginning 3. The middle 4. Conclusion 5. Ending 6. Finish 1. Start 3. The middle 5. Ending 2. The beginning 4. Conclusion 6. Finish Partition 2SOURCE CONNECTOR SINK CONNECTOR 1. Start 3. The middle 5. Ending 2. The beginning 4. Conclusion 6. Finish Topic © 2019 IBM Corporation
  • 28. Key considerations – Data formats © 2019 IBM Corporation
  • 29. Key considerations – Data formats EXTERNAL SYSTEM FORMAT KAFKA RECORD FORMAT KAFKA CONNECT INTERNAL FORMAT © 2019 IBM Corporation
  • 30. EXTERNAL SYSTEM FORMAT KAFKA RECORD FORMAT KAFKA CONNECT INTERNAL FORMAT Connector builders e.g. com.ibm.eventstreams.connect.mqsource.builders.JsonRecordBuilder e.g. com.ibm.eventstreams.connect.mqsink.builders.JsonMessageBuilder Key considerations – Data formats © 2019 IBM Corporation
  • 32. Implementing the API © 2019 IBM Corporation
  • 33. version() config() validate(config) start(config) Connector initialize parse and validate config Lifecycle of a connector © 2019 IBM Corporation
  • 34. Connector config @Override public ConfigDef config() { ConfigDef configDef = new ConfigDef(); configDef.define(”config_option", Type.STRING, Importance.HIGH, ”Config option."); return configDef; } $ curl -X PUT -d '{"connector.class":”MyConnector"}’ http://localhost:8083/connector-plugins/MyConnector/config/validate {“configs”: [{ “definition”: {“name”: “config_option”, “importance”: “HIGH”, “default_value”: null, …}, ”value”: { “errors”: [“Missing required configuration ”config_option” which has no default value.”], … } © 2019 IBM Corporation
  • 35. version() config() validate(config) start(config) taskClass() taskConfigs(max) initialize parse and validate config create tasks Lifecycle of a connector stop() Connector © 2019 IBM Corporation
  • 36. Source Task initialize running stop() poll() commit() commitRecord(record) version() start(config) initialize parse and validate config create tasks Lifecycle of a connector Connector © 2019 IBM Corporation
  • 37. Lifecycle of a connector initialize running stop() put(records) flush(offsets) version() start(config) Sink Task initialize parse and validate config create tasksConnector © 2019 IBM Corporation
  • 38. Kafka Connect and IBM MQ © 2019 IBM Corporation
  • 39. It’s easy to connect IBM MQ to Apache Kafka IBM has created a pair of connectors, available as source code or as part of IBM Event Streams Source connector From MQ queue to Kafka topic https://github.com/ibm-messaging/kafka-connect-mq-source Sink connector From Kafka topic to MQ queue https://github.com/ibm-messaging/kafka-connect-mq-sink Fully supported by IBM for customers with support entitlement for IBM Event Streams © 2019 IBM Corporation
  • 40. Where can I get these magnificent connectors? https://ibm.github.io/event-streams/connectors/ © 2019 IBM Corporation
  • 41. Connecting IBM MQ to Apache Kafka The connectors are deployed into a Kafka Connect runtime This runs between IBM MQ and Apache Kafka CLIENT IBM MQ TO.KAFKA.Q FROM.KAFKA.Q Kafka Connect worker FROM.MQ.TOPIC Kafka Connect worker MQ SINK CONNECTOR TO.MQ.TOPIC MQ SOURCE CONNECTOR CLIENT © 2019 IBM Corporation
  • 42. Running Kafka Connect on a mainframe IBM MQ Advanced for z/OS VUE provides support for the Kafka Connect workers to be deployed onto z/OS Unix System Services using bindings connections to MQ BINDINGS IBM MQ Advanced for z/OS VUE TO.KAFKA.Q FROM.KAFKA.Q Kafka Connect worker FROM.MQ.TOPIC Kafka Connect worker MQ SINK CONNECTOR TO.MQ.TOPIC MQ SOURCE CONNECTOR BINDINGS Unix System Services © 2019 IBM Corporation
  • 43. Design of the MQ sink connector © 2019 IBM Corporation
  • 44. MQ sink connector Converter MessageBuilder TO.MQ.TOPIC SinkRecord Value (may be complex) Schema Kafka Record Value byte[] Key MQ Message Payload MQMD (MQRFH2) MQ SINK CONNECTOR FROM.KAFKA.Q © 2019 IBM Corporation
  • 45. Sink task – Design Sink connector is relatively simple The interface is synchronous and fits MQ quite well Balancing efficiency with resource limits is the key put(Collection<SinkRecord> records) Converts Kafka records to MQ messages and sends in a transaction Always requests a flush to avoid hitting MQ transaction limits flush(Map<TopicPartition, OffsetAndMetadata> currentOffsets) Commits any pending sends This batches messages into MQ without excessively large batches © 2019 IBM Corporation
  • 46. Design of the MQ source connector © 2019 IBM Corporation
  • 47. MQ source connector RecordBuilder Converter TO.MQ.TOPICSource Record Value (may be complex) Schema MQ Message Kafka Record Null Record MQ SOURCE CONNECTOR TO.KAFKA.Q Value byte[] Payload MQMD (MQRFH2) © 2019 IBM Corporation
  • 48. Source task – Original design Source connector is more complicated It’s multi-threaded and asynchronous which fits MQ less naturally List<SourceRecord> poll() Waits for up to 30 seconds for MQ messages and returned as a batch Multiple calls to poll() could contribute to an MQ transaction commit() Asynchronously commits the active MQ transaction Works quite well but commit() is too infrequent under load which causes throttling commit() does ensure that the most recent batch of messages polled have been acked by Kafka, but it doesn’t quite feel like the right way to do it © 2019 IBM Corporation
  • 49. Source task – Revised design Changed so each call to poll() comprises a single MQ transaction commit() is no longer used in normal operation List<SourceRecord> poll() Waits for records from the previous poll() to be acked by Kafka Commits the active MQ transaction – the previous batch Waits for up to 30 seconds for MQ messages and returned as a new batch commitRecord(SourceRecord record) Just counts up the acks for the records sent MQ transactions are much shorter-lived No longer throttles under load Feels a much better fit for the design of Kafka Connect © 2019 IBM Corporation
  • 50. Stopping a source task is tricky stop() is called on SourceTask to indicate the task should stop Running asynchronously wrt to the polling and commit threads Can’t be sure whether poll() or commit() are currently active or will be called very soon Since poll() and commit() may both want access to the MQ connection It’s not clear when it’s safe to close it KIP-419: Safely notify Kafka Connect SourceTask is stopped Adds a stopped() method to SourceTask that is guaranteed to be the final call to the task uninitialized initialize() initialized running stopping start() stop() stopped() poll() commit() commitRecord() poll() commit() commitRecord()© 2019 IBM Corporation
  • 51. Summary Over 80 connectors IBM MQ HDFS Elasticsearch MySQL JDBC MQTT CoAP + many others © 2019 IBM Corporation
  • 52. Summary Connector initialize parse and validate config create tasks Sink Task initialize running Source Task initialize running © 2019 IBM Corporation
  • 53. Summary © 2019 IBM Corporation
  • 54. Thank you Kate Stanley Andrew Schofield Kafka Connect: IBM Event Streams: © 2019 IBM Corporation @katestanley91 https://medium.com/@andrew_schofield https://kafka.apache.org/documentation/#connect https://github.com/ibm-messaging/kafka-connect-mq-source https://github.com/ibm-messaging/kafka-connect-mq-sink https://ibm.github.io/event-streams/connectors/ ibm.com/cloud/event-streams