19. Data Ingestion Platform (DiP)19
DiP using Storm
• Multiple processing paradigm - Real-time , Interactive and Batch processes
• Reliable – each unit of data (tuple) will be processed at least once or exactly once.
• Fast and scalable - parallel calculations are run across a cluster of machines.
• Fault-tolerant - workers automatically restarts in case they die .
Apache Storm features
20. Data Ingestion Platform (DiP)20
DiP using Spark Streaming
• Multiple processing paradigm - Batch and Interactive
• Ease of Use –contains high-level operators written in Java, Scala and Python
• Fault Tolerance - lost work and operator state can be recovered with no extra code
• Code Reusability – same code can be used for batch processing, join streams against historical data, or to run ad-
hoc queries on stream state
Spark Streaming features
21. Data Ingestion Platform (DiP)21
DiP using Apex
Modular - Malhar, a library of operators , comes bundled with Apex for quick development cycles
• Supports both stream and batch processing
• Supports operator exchange at runtime
• Supports fault tolerance and dynamic scaling
Apache Apex features
31. Kafka Mirroring
Xavient Corporate Overview31
The Kafka mirroring feature is used for creating the replica of an existing cluster, for example, for the
replication of an active datacenter into a passivedatacenter. Kafka providesa mirror maker tool for
mirroring the source cluster intotarget cluster.