This document summarizes a presentation about designing an automated data transport system using Kafka Connect and KSQL. The system handles complex data dependencies across tables while respecting business rules. It establishes a feedback loop to notify the source system about successful deliveries and errors. The solution uses Kafka components like streams and tables, source and sink connectors, and microservices. Data is extracted from multiple source tables and written to topics. Sink connectors load the data to the destination. KSQL is used for aggregations to declare when an event is complete or needs alerting. The design leverages existing tools with minimal custom coding.
How to Troubleshoot Apps for the Modern Connected Worker
Designing a Feedback Loop for Event-Driven Data Sharing With Teresa Wang | Current 2022
1. Designing a Feedback Loop for
Event-Driven Data Sharing
Enabled with
Kafka Connect and KSQL
Presented by: Teresa Wang
weijiuan.t.wang@jpl.nasa.gov
Enterprise Business Information Services Division (EBIS)
Jet Propulsion Laboratory
Oct. 4th 2022
3. Disclaimer
Reference herein to any specific commercial product, process, or
service by trade name, trademark, manufacturer, or otherwise, does
not constitute or imply its endorsement by the United States
Government or the Jet Propulsion Laboratory, California Institute of
Technology
10/4/2022 3
4. • Handles complex data dependencies across tables and respecting business and atomicity requirements between the source and target systems
• Establishes a robust feedback loop to properly notify sending system about successful deliveries and to identify/remediate errors
10/4/2022 4
Tasks: Design a fully automated data transport system that …
Earned Value Management System
Scope
Budgets
Schedule
5. Challenges
• The triggering event is recorded in a staging table and data related to the triggered event are
located in 9 separate tables with Foreign-Key relationship
• Some source tables have a large data structure (e.g. > 120 data elements)
– Cannot use multi-joins in a single source connector for query-based polling for all involved source tables
– Certain data elements require data type recasting
• Determining the event-based data transport has completed and received by the destination
– Each source table contains various amount of data; may be as few as dozens, may be more than 100,000 rows
– How to determine if an error occurred during data transport?
• Triggering downstream processing in source and destination systems upon event completion
10/4/2022 5
6. Design Principles
• Maximizing the usage of Kafka Components
– To ensure data consistency and integrity
– Extend/Augment as necessary
• Separation-of-Concerns
– Data Transport Abstraction means …
• No need for software engineers to write Kafka producers or consumers
• Kafka can do SMT for label translation for business domain-specific terms
• Maintainability & Reusability
– Prefer configuration over coding for change requests
– Future-proof with enabling data pipeline extensions in mind
– Framework is reusable for other use cases
10/4/2022 6
7. 10/4/2022 7
Solution Design Components
KSQL
Streaming,
KTables
Control -
Manifest
W
iretapping
JDBC
Java
Microservices
Confluent
Connectors
8. 9
Source
connectors
Event table
9 data tables
9 Topics
9
Sink
connectors
The Basis: A Typical Data Pipeline Design with Source/Sink Connectors
? Event status ?
9 data tables
? Event status ?
Kafka
EVM sys.
9. Visualized Data Pipeline Design feedback flows:
10/4/2022 9
Topic
PROGRAM
Topic
WIRETAPS
Topic
PENDING_ALERT
S
KSQL-Stream
PROGRAM_S
KSQL-Table
PROGRAM_T
KSQL-Table/Topic
EVENT_COMPLETION_T
KSQL-Stream
WIRETAPS_S
KSQL-Table
WIRETAPS_T
KSQL-Table/Topic
ALERTS_T
KSQL-Stream
PENDING_ALERTS_S
Source
connectors
Wiretapped
Sink
connectors
Sink
connectors
Watcher microservice
Notifications
microservice
Other data
Topics
pivoted
(compares running totals in
WIRETAPS_T with control totals on manifest in PROGRAM_T)
10. Summary – An Enterprise Integration Design Pattern
10/4/2022 10
Supervisory Structure
• Acquires from the source a manifest for each event
• Reports unmet expectations with a continuously running “watcher” microservice
• Keep producing “pending” alerts until an event is either completed or erred
Wiretapped
Sink Connectors
• Captures the #-messages written to destination during sink connectors deliveries
KSQL Aggregations
• for event-completion declaration when expectations are met
• for declaring an “alert” when the “#-pending alerts” exceeds a preset threshold
Producing Feedbacks
• Consuming from both event-completions (Successes) & event-errors (Alerts) topics
• Sent to both source and target systems to trigger further processing
11. Summary – Why We Like This Solution Design
• Data transport abstraction: Software Engineers don’t need to code for
– Kafka producers or consumers
– Web services for data exchange validation with the target system
• Leveraging KSQL aggregations for
– Declaration on an event “completion”, or
– Declaration on an event “alert”
• A minimalist approach – Less (coding by data engineers) is More!
– Extending JDBC to enable wiretapping on sink connectors based on configurable attributes
– Single purpose “Watcher” microservice to produce “pending alerts”
– Employing an existing “notification” microservice for multi-channel broadcasting about ”alerts”
10/4/2022 11
12. Acknowledgements
Jacob Nowicki for introducing Kafka to the EBIS Division at JPL !!
Peter Grzegorczyk for his innovative engineering collaboration !!
10/4/2022 12