Mais conteúdo relacionado Semelhante a Lessons Learned Building a Connector Using Kafka Connect (Katherine Stanley & Andrew Schofield, IBM UK) Kafka Summit SF 2019 (20) Lessons Learned Building a Connector Using Kafka Connect (Katherine Stanley & Andrew Schofield, IBM UK) Kafka Summit SF 20191. Lessons learned building a connector
using Kafka Connect
Kate Stanley and Andrew Schofield
Kafka Summit SF 2019
IBM Event StreamsApache Kafka© 2019 IBM Corporation
2. “Kafka Connect is a tool for scalably and
reliably streaming data between Apache
Kafka and other systems”
© 2019 IBM Corporation
9. Kafka Connect – an API for
implementing connectors
© 2019 IBM Corporation
10. Kafka Connect – a runtime for executing
connector processes
© 2019 IBM Corporation
16. Getting started with Kafka Connect
—— kafka
|—— libs
| |—— connect-api-2.3.0.jar
| |—— connect-basic-auth-extension-2.3.0.jar
| |—— connect-file-2.3.0.jar
| |—— connect-json-2.3.0.jar
| |—— connect-runtime-2.3.0.jar
| `—— connect-transforms-2.3.0.jar
`—— bin
|—— connect-distributed.sh
`—— connect-standalone.sh
© 2019 IBM Corporation
17. Getting started with Kafka Connect
FROM ibmcom/eventstreams-kafka
COPY connect-distributed.properties /opt/kafka/config/
COPY connectors /opt/connectors/
WORKDIR /opt/kafka
EXPOSE 8083
ENTRYPOINT ["./bin/connect-distributed.sh",
"config/connect-distributed.properties"]
© 2019 IBM Corporation
19. Getting started with Kafka Connect
$ curl http://localhost:8083/connector-plugins
[
{
"class":"org.apache.kafka.connect.file.FileStreamSinkConnector",
"type":"sink",
"version":"2.3.0”
},
{
"class":"org.apache.kafka.connect.file.FileStreamSourceConnector",
"type":"source",
"version":"2.3.0”
}
]
© 2019 IBM Corporation
20. Getting started with Kafka Connect
$ echo ‘{
"name":"kate-file-load",
"config":{"connector.class":"FileStreamSource",
"file":"config/server.properties",
"topic":"kafka-config-topic"}}’ |
curl -X POST -d @- http://localhost:8083/connectors
--header "content-Type:application/json"
$ curl http://localhost:8083/connectors
["kate-file-load"]
© 2019 IBM Corporation
22. Anatomy of a connector
Task X
Connector X Task X
Task X
© 2019 IBM Corporation
25. Key considerations – partitions, tasks and topics
file.txt
1. Start
2. The beginning
3. The middle
4. Conclusion
5. Ending
6. Finish
© 2019 IBM Corporation
26. Key considerations – partitions, tasks and topics
Partition 1
file.txt
1. Start
2. The beginning
3. The middle
4. Conclusion
5. Ending
6. Finish
1. Start
3. The middle
5. Ending
2. The beginning
4. Conclusion
6. Finish
Partition 2SOURCE
CONNECTOR
Topic
© 2019 IBM Corporation
27. Key considerations – partitions, tasks and topics
file-copy.txt
Partition 1
file.txt
1. Start
2. The beginning
3. The middle
4. Conclusion
5. Ending
6. Finish
1. Start
3. The middle
5. Ending
2. The beginning
4. Conclusion
6. Finish
Partition 2SOURCE
CONNECTOR
SINK
CONNECTOR
1. Start
3. The middle
5. Ending
2. The beginning
4. Conclusion
6. Finish
Topic
© 2019 IBM Corporation
29. Key considerations – Data formats
EXTERNAL SYSTEM
FORMAT
KAFKA RECORD
FORMAT
KAFKA CONNECT INTERNAL
FORMAT
© 2019 IBM Corporation
30. EXTERNAL SYSTEM
FORMAT
KAFKA RECORD
FORMAT
KAFKA CONNECT INTERNAL
FORMAT
Connector builders
e.g. com.ibm.eventstreams.connect.mqsource.builders.JsonRecordBuilder
e.g. com.ibm.eventstreams.connect.mqsink.builders.JsonMessageBuilder
Key considerations – Data formats
© 2019 IBM Corporation
34. Connector config
@Override
public ConfigDef config() {
ConfigDef configDef = new ConfigDef();
configDef.define(”config_option", Type.STRING, Importance.HIGH, ”Config option.");
return configDef;
}
$ curl -X PUT -d '{"connector.class":”MyConnector"}’
http://localhost:8083/connector-plugins/MyConnector/config/validate
{“configs”: [{
“definition”: {“name”: “config_option”, “importance”: “HIGH”, “default_value”: null, …},
”value”: {
“errors”: [“Missing required configuration ”config_option” which has no default value.”],
…
}
© 2019 IBM Corporation
36. Source Task initialize running
stop()
poll()
commit()
commitRecord(record)
version()
start(config)
initialize
parse and
validate config create tasks
Lifecycle of a connector
Connector
© 2019 IBM Corporation
37. Lifecycle of a connector
initialize running
stop()
put(records)
flush(offsets)
version()
start(config)
Sink Task
initialize
parse and
validate config create tasksConnector
© 2019 IBM Corporation
39. It’s easy to connect IBM MQ to Apache Kafka
IBM has created a pair of connectors, available as source code or as part of IBM Event
Streams
Source connector
From MQ queue to Kafka topic
https://github.com/ibm-messaging/kafka-connect-mq-source
Sink connector
From Kafka topic to MQ queue
https://github.com/ibm-messaging/kafka-connect-mq-sink
Fully supported by IBM for customers with support entitlement for IBM Event Streams
© 2019 IBM Corporation
40. Where can I get these magnificent connectors?
https://ibm.github.io/event-streams/connectors/
© 2019 IBM Corporation
41. Connecting IBM MQ to Apache Kafka
The connectors are deployed into a Kafka Connect runtime
This runs between IBM MQ and Apache Kafka
CLIENT
IBM MQ
TO.KAFKA.Q
FROM.KAFKA.Q
Kafka Connect worker
FROM.MQ.TOPIC
Kafka Connect worker
MQ SINK
CONNECTOR
TO.MQ.TOPIC
MQ SOURCE
CONNECTOR
CLIENT
© 2019 IBM Corporation
42. Running Kafka Connect on a mainframe
IBM MQ Advanced for z/OS VUE provides support for the Kafka Connect workers to
be deployed onto z/OS Unix System Services using bindings connections to MQ
BINDINGS
IBM MQ Advanced
for z/OS VUE
TO.KAFKA.Q
FROM.KAFKA.Q
Kafka Connect worker
FROM.MQ.TOPIC
Kafka Connect worker
MQ SINK
CONNECTOR
TO.MQ.TOPIC
MQ SOURCE
CONNECTOR
BINDINGS
Unix System Services
© 2019 IBM Corporation
44. MQ sink connector
Converter MessageBuilder
TO.MQ.TOPIC SinkRecord
Value
(may be complex)
Schema
Kafka Record
Value
byte[]
Key
MQ Message
Payload
MQMD
(MQRFH2)
MQ SINK
CONNECTOR
FROM.KAFKA.Q
© 2019 IBM Corporation
45. Sink task – Design
Sink connector is relatively simple
The interface is synchronous and fits MQ quite well
Balancing efficiency with resource limits is the key
put(Collection<SinkRecord> records)
Converts Kafka records to MQ messages and sends in a transaction
Always requests a flush to avoid hitting MQ transaction limits
flush(Map<TopicPartition, OffsetAndMetadata> currentOffsets)
Commits any pending sends
This batches messages into MQ without excessively large batches
© 2019 IBM Corporation
47. MQ source connector
RecordBuilder Converter
TO.MQ.TOPICSource Record
Value
(may be complex)
Schema
MQ Message Kafka Record
Null Record
MQ SOURCE
CONNECTOR
TO.KAFKA.Q
Value
byte[]
Payload
MQMD
(MQRFH2)
© 2019 IBM Corporation
48. Source task – Original design
Source connector is more complicated
It’s multi-threaded and asynchronous which fits MQ less naturally
List<SourceRecord> poll()
Waits for up to 30 seconds for MQ messages and returned as a batch
Multiple calls to poll() could contribute to an MQ transaction
commit()
Asynchronously commits the active MQ transaction
Works quite well but commit() is too infrequent under load which causes throttling
commit() does ensure that the most recent batch of messages polled have been acked by Kafka, but it
doesn’t quite feel like the right way to do it
© 2019 IBM Corporation
49. Source task – Revised design
Changed so each call to poll() comprises a single MQ transaction
commit() is no longer used in normal operation
List<SourceRecord> poll()
Waits for records from the previous poll() to be acked by Kafka
Commits the active MQ transaction – the previous batch
Waits for up to 30 seconds for MQ messages and returned as a new batch
commitRecord(SourceRecord record)
Just counts up the acks for the records sent
MQ transactions are much shorter-lived
No longer throttles under load
Feels a much better fit for the design of Kafka Connect
© 2019 IBM Corporation
50. Stopping a source task is tricky
stop() is called on SourceTask to indicate the task should stop
Running asynchronously wrt to the polling and commit threads
Can’t be sure whether poll() or commit() are currently active or will be called very soon
Since poll() and commit() may both want access to the MQ connection
It’s not clear when it’s safe to close it
KIP-419: Safely notify Kafka Connect SourceTask is stopped
Adds a stopped() method to SourceTask that is guaranteed to be the final call to the task
uninitialized
initialize()
initialized running stopping
start() stop() stopped()
poll()
commit()
commitRecord()
poll()
commit()
commitRecord()© 2019 IBM Corporation
54. Thank you
Kate Stanley
Andrew Schofield
Kafka Connect:
IBM Event Streams:
© 2019 IBM Corporation
@katestanley91
https://medium.com/@andrew_schofield
https://kafka.apache.org/documentation/#connect
https://github.com/ibm-messaging/kafka-connect-mq-source
https://github.com/ibm-messaging/kafka-connect-mq-sink
https://ibm.github.io/event-streams/connectors/
ibm.com/cloud/event-streams