This document discusses Neo4j Streams, which enables real-time streaming of Neo4j database changes to Apache Kafka. It includes a change data capture plugin that streams transaction events from Neo4j to Kafka, a sink plugin that ingests data from Kafka into Neo4j based on custom rules, and procedures to consume and produce data directly from Cypher. The presenters demonstrate how Neo4j Streams can be used to build real-time data pipelines and streaming applications integrated with Neo4j. They encourage attendees to try the integration and provide feedback.
2. LARUS Business Automation Srl Italy’s #1 Neo4j Partner
WHO ARE WE?
Andrea
[:WORKS_AT]
[:LOVES]
[:INTEGRATOR_LEADER_FOR]
Michael
[:WORKS_AT]
[:WORKS_WITH]
4. LARUS Business Automation Srl Italy’s #1 Neo4j Partner
Agenda
● Introduction
○ Partnership Neo4j and Larus
○ Partnership with Confluent
● What is Neo4j Streams?
○ What is Apache Kafka?
○ How we combined Neo4j and Kafka?
● The Kafka Connect Sink Plugin
○ DEMO
● Questions
6. LARUS Business Automation Srl Italy’s #1 Neo4j Partner
COLLABORATING FOR NEO4J USERS
2016
Neo4j JDBC Driver
20152011
First Spikes
in Retail for
Articles’
Clustering
2014 2018
Neo4j APOC, ETL, Spark, Zeppelin, Kafka
2019
Kafka commerical, GraphQL
7. LARUS Business Automation Srl Italy’s #1 Neo4j Partner
NEO4J STREAMS: HISTORY
Andrea
[:AUTHOR_OF][:CREATOR_OF]
X
Michael
8. LARUS Business Automation Srl Italy’s #1 Neo4j Partner
Great partnership -> Quick Developments
● Frequent Request of Neo4j users
● 2018
○ October Started Experiment
○ December First Release
● 2019
○ January Kafka Connect Plugin
○ February Article on Confluent Blog
○ March Several Articles by Andrea
○ April Confluent Partnership
○ May Gold Certification of Plugin
○ May Commercial support through Larus
9. LARUS Business Automation Srl Italy’s #1 Neo4j Partner
Benefits
● Avoid custom "hacky" solutions
● Deployed by Neo4j Field Engineering
● Used by many customers (hardened)
● Continuous development
● Quick response to issues
● Officially supported by Confluent and Neo4j through Larus
10. LARUS Business Automation Srl Italy’s #1 Neo4j Partner
Neo4j - Kafka Integration - Use Cases
HOW CAN IT BE USED?
● write / read data directly from Neo4j operations to Kafka
● change data capture stream graph changes into larger
architectures, e.g. to feed microservices or other
databases
● exchange data/updates between distinct Neo4j
installations, e.g. from analytics
● active/active setups of two Neo4j clusters
● integrate with existing Kafka architectures of customers
● use other Kafka connectors to offer more Neo4j
integrations
● build just-in-time data warehouses with Spark & Hadoop
12. LARUS Business Automation Srl Italy’s #1 Neo4j Partner
What is Apache Kafka?
A DISTRIBUTED STREAMING PLATFORM
Has three key capabilities:
● Publish and subscribe to streams of records;
● Store streams of records in a fault-tolerant
durable way;
● Process streams of records as they occur.
13. LARUS Business Automation Srl Italy’s #1 Neo4j Partner
What is Apache Kafka?
HOW IT WORKS?
1. TOPICS: a topic is a category or feed name to
which records are published.
1. PARTITIONS: for each topic, the Kafka cluster
maintains a partitioned, distributed, persistent log
14. LARUS Business Automation Srl Italy’s #1 Neo4j Partner
What is Apache Kafka?
HOW IT’S USED?
Kafka is generally used for two classes of
applications:
● Building real-time streaming data pipelines;
● Building real-time streaming applications.
15. LARUS Business Automation Srl Italy’s #1 Neo4j Partner
Apache Kafka - Use Cases
FOR WHICH APPS IS IT USED?
High variety of applications:
Capture
● real world events, eg. from sensors
● human events, e.g. retail transactions
● database events - change data capture
Architecure
● Integrate & decouple different systems
● Robust "Message Bus"
● Filter, project, combine different streams
● The "Event Log" is the truth
17. LARUS Business Automation Srl Italy’s #1 Neo4j Partner
What is Neo4j Streams?
Andrea
[:AUTHOR_OF][:CREATOR_OF] X
Michael
ENABLES DATA STREAM ON NEO4J
The project is a Neo4j Plugin composed of several parts:
● Neo4j Streams Change Data Capture;
● Neo4j Streams Sink;
● Neo4j Streams Procedures
We also have a Kafka Connect Plugin:
● Kafka Connect Sink plugin.
19. LARUS Business Automation Srl Italy’s #1 Neo4j Partner
Neo4j Streams: Change Data Capture
Change data “what”?
In databases, Change Data Capture (CDC) is a set of software design patterns used to determine (and
track) the data that has changed so an action can be taken using the changed data.
Well suited use-cases?
● CDC solutions occur most often in data-warehouse environments;
● Allows to replicate databases without having a/much performance impact on its operation.
20. LARUS Business Automation Srl Italy’s #1 Neo4j Partner
Neo4j Streams: Change Data Capture
How it works?
After installation and configuration of your Kafka endpoints, the plugin is automatically up and running
after Neo4j is started. Each transaction communicates its changes to our event listener:
● exposing creation, updates and deletes of Nodes, Relationships and Properties
● providing before-and-after information
● provide schema information
● configuring property filtering for each topic
Those events are sent asynchronously to Kafka, so the commit path should not be influenced by that.
22. LARUS Business Automation Srl Italy’s #1 Neo4j Partner
Neo4j Streams: Sink
INGEST YOUR DATA, WITH YOUR RULES
Initially, we thought about a generic consumer with a fixed projection of events into Nodes and
Relationships.
We decided that instead, we want to give the user the power to use custom import statements
per topic to turn Events into arbitrary graph structures.
(event)-[:TO]->(graph)
23. LARUS Business Automation Srl Italy’s #1 Neo4j Partner
HOW IT WORKS?
Configure an import statement for each Kafka topic
streams.sink.topic.cypher.<TOPIC>=<CYPHER_STATEMENT>
For example:
streams.sink.topic.cypher.sales=
MATCH (c:Customer {id: event.start.id})
MATCH (p:Product {id: event.end.id})
MERGE (c)-[:PLACED]->(o:Order)-[:FOR]->(p)
SET o += event.properties
Neo4j Streams: Sink
25. LARUS Business Automation Srl Italy’s #1 Neo4j Partner
Kafka Connect
WHAT IS KAFKA CONNECT?
In open source component of Apache Kafka, is a
framework for connecting Kafka with external
systems such as databases, key-value stores,
search indexes, and file systems.
26. LARUS Business Automation Srl Italy’s #1 Neo4j Partner
Neo4j Streams: Kafka Connect Sink
HOW IT WORKS?
It works exactly in the same way as the Neo4j Sink
plugin so you can provide for each topic your own
ingestion setup.
You can download it from the Confluent HUB!
29. LARUS Business Automation Srl Italy’s #1 Neo4j Partner
Neo4j Streams: Streams Procedures
CONSUME/PRODUCE DATA DIRECTLY FROM CYPHER
The Neo4j Streams project comes out with two procedures:
● streams.publish: allows custom message streaming from Neo4j to the configured environment by
using the underlying configured Producer;
● streams.consume: allows consuming messages from a given topic.
32. LARUS Business Automation Srl Italy’s #1 Neo4j Partner
Neo4j Streams: Lessons learned
THE POWER OF THE STREAM!
● We have seen how to use the CDC in order to
stream transaction events from Neo4j to other
systems;
● We have seen how to use the SINK in order to
ingest data into Neo4j by providing our own
business rules;
● We have seen how to use the Streams
PROCEDURES in order to consume/produce
data directly from Cypher.
33. LARUS Business Automation Srl Italy’s #1 Neo4j Partner
New Features
COMMUNITY DRIVEN DEVELOPMENT
Since the first released of the plugin we gathered a lot of feedback by the community
that lead us to create new cool features:
● Round-trip source/sink from Neo4j --> Neo4j
● Provide a Pattern for extract information from every JSON document
● Manual commit management of offsets
● Dead Letter queue for error handling
38. LARUS Business Automation Srl Italy’s #1 Neo4j Partner
GIVE US FEEDBACK
PLEASE PROVIDE US FEEDBACK
If you plan to use the Streams Plugin please give us a feedback!
https://github.com/neo4j-contrib/neo4j-streams