SlideShare uma empresa Scribd logo
1 de 53
@nicolas_frankel
A gentle introduction to Stream
Processing
Nicolas Fränkel
@nicolas_frankel
Me, myself and I
 Former developer, team lead, architect,
blah-blah
 Developer Advocate
 Lots of crazy creative ideas
@nicolas_frankel
Hazelcast
HAZELCAST IMDG is an operational,
in-memory, distributed computing
platform that manages data using
in-memory storage and performs
execution for breakthrough
and scale.
HAZELCAST JET is the ultra
fast, application embeddable,
3rd generation stream
processing engine for low
latency batch and stream
processing.
@nicolas_frankel
Schedule
 Why streaming?
 Streaming approaches
 Hazelcast Jet
 Open Data
 General Transit Feed Specification
 The demo!
 Q&A
@nicolas_frankel
In a time before our time…
Data was neatly stored in SQL databases
@nicolas_frankel
The need for Extract Transform Load
 Analytics
• Supermarket sales in the last hour?
 Reporting
• Banking account annual closing
@nicolas_frankel
What SQL really means
 Constraints
 Joints
 Normal forms
@nicolas_frankel
Writes vs. reads
 Normalized vs. denormalized
 Correct vs. fast
@nicolas_frankel
The need for ETL
 Different actors
 With different needs
 Using the same database?
@nicolas_frankel
The batch model
1. Extract
2. Transform
3. Load
@nicolas_frankel
Batches are everywhere!
@nicolas_frankel
Properties of batches
 Scheduled at regular intervals
• Daily
• Weekly
• Monthly
• Yearly
 Run in a specific amount of time
@nicolas_frankel
Oops
 When the execution time overlaps the
next execution schedule
 When the space taken by the data
exceeds the storage capacity
 When the batch fails mid-execution
 etc.
@nicolas_frankel
Big data!
 Parallelize everything
• Map - Reduce
• Hadoop
 NoSQL
• Schema on Read vs. Schema on Write
@nicolas_frankel
Or chunking?
 Keep a cursor
• And only manage “chunks” of data
 What about new data coming in?
@nicolas_frankel
Event-Driven Programming
“Programming paradigm in which the flow of
the program is determined by events such
as user actions (mouse clicks, key presses),
sensor outputs, or messages from other
programs or threads”
-- Wikipedia
@nicolas_frankel
Event Sourcing
“Event sourcing persists the state of a business entity
such an Order or a Customer as a sequence of state-
changing events. Whenever the state of a business
entity changes, a new event is appended to the list of
events. Since saving an event is a single operation, it is
inherently atomic. The application reconstructs an entity’s
current state by replaying the events.”
-- https://microservices.io/patterns/data/event-sourcing.html
@nicolas_frankel
Database internals
 Ordered append-only log
• e.g. MySQL binlog
@nicolas_frankel
Make everything event-based!
@nicolas_frankel
Benefits
 Memory-friendly
 Easily processed
 Pull vs. push
• Very close to real-time
• Keeps derived data in-sync
@nicolas_frankel
From finite datasets to infinite
@nicolas_frankel
Streaming is smart ETL
Processing
Ingest
In-Memory
Operational
Storage
Combine
Join, Enrich,
Group, Aggregate
Stream
Windowing,
Event-Time
Processing
Compute
Distributed and
Parallel
Computation
Transform
Filter, Clean,
Convert
Publish
In-Memory,
Subscriber
Notifications
Notify if response
time is 10% over 24
hour average, second
by second
@nicolas_frankel
Use Case: Analytics and Decision Making
 Real-time dashboards
• Decision making
• Recommendations
 Stats (gaming, infrastructure monitoring)
 Prediction - often based on algorithmic
prediction
• Push stream through ML model
 Complex Event Processing
@nicolas_frankel
Persistent event-storage systems
 Kafka
 Pulsar
@nicolas_frankel
Kafka
 Distributed
 On-disk storage
 Messages sent and read from a topic
• Publish-subscribe
• Queue
 Consumer can keep track of the offset
@nicolas_frankel
In-memory stream processing engines
 Apache Flink
 Amazon Kinesis
 IBM Streams
 Hazelcast Jet
 Apache Beam
• Abstraction over some of the above
 …
@nicolas_frankel
Hazelcast Jet
 Apache 2 Open Source
 Single JAR
 Leverages Hazelcast IMDG
 Unified batch/streaming API
 (Hazelcast Jet Enterprise)
@nicolas_frankel
Hazelcast Jet
@nicolas_frankel
Concept: Pipeline
 Declaration (code) that defines and links
sources, transforms, and sinks
 Platform-specific SDK (Pipeline API in
Jet)
 Client submits pipeline to the Stream
Processing Engine (SPE)
@nicolas_frankel
Concept: Job
 Running instance of pipeline in SPE
 SPE executes the pipeline
 Code execution
 Data routing
 Flow control
 Parallel and distributed execution
@nicolas_frankel
Imperative model
final String text = "...";
final Map<String, Long> counts = new HashMap<>();
for (String word : text.split("W+")) {
Long count = counts.get(word);
counts.put(count == null ? 1L : count + 1);
}
@nicolas_frankel
Declarative model
Map<String, Long> counts = lines.stream()
.map(String::toLowerCase)
.flatMap(
line -> Arrays.stream(line.split("W+"))
)
.filter(word -> !word.isEmpty())
.collect(Collectors.groupingBy(
word -> word, Collectors.counting())
);
@nicolas_frankel
What Distributed Means to Hazelcast
 Multiple nodes
 Scalable storage and performance
 Elasticity
 Data stored, partitioned and replicated
 No single point of failure
@nicolas_frankel
Distributed Parallel Processing
Pipeline p = Pipeline.create();
p.drawFrom(Sources.<Long, String>map(BOOK_LINES))
.flatMap(line -> traverseArray(line.getValue().split("W+")))
.filter(word -> !word.isEmpty())
.groupingKey(wholeItem())
.aggregate(counting())
.drainTo(Sinks.map(COUNTS));
Data
Sink
Data
Source
from aggrmap filter to
Translate declarative code to a Directed Acyclic Graph
@nicolas_frankel
Node 1
Distributed Parallel Processing
read cmb
map
+
filter
acc sink
read cmb
map
+
filter
acc
Node 2
read cmb
map
+
filter
acc
sinkread cmb
map
+
filter
acc
Data
Source
Data
Sink
sink
sink
@nicolas_frankel
Open Data
« Open data is the idea that some data
should be freely available to everyone to
use and republish as they wish, without
restrictions from copyright, patents or
other mechanisms of control. »
--https://en.wikipedia.org/wiki/Open_data
@nicolas_frankel
Some Open Data initiatives
 France:
• https://www.data.gouv.fr/fr/
 Switzerland:
• https://opendata.swiss/en/
 European Union:
• https://data.europa.eu/euodp/en/data/
@nicolas_frankel
Challenges
1. Access
2. Format
3. Standard
4. Data correctness
@nicolas_frankel
Access
 Download a file
 Access it interactively through a web-
service
@nicolas_frankel
Format
In general, Open Data means Open
Format
 PDF
 CSV
 XML
 JSON
 etc.
@nicolas_frankel
Standard
 Let’s pretend the format is XML
• Which grammar is used?
 A shared standard is required
• Congruent to a domain
@nicolas_frankel
Data correctness
"32.TA.66-43","16:20:00","16:20:00","8504304"
"32.TA.66-44","24:53:00","24:53:00","8500100"
"32.TA.66-44","25:00:00","25:00:00","8500162"
"32.TA.66-44","25:02:00","25:02:00","8500170"
"32.TA.66-45","23:32:00","23:32:00","8500170"
@nicolas_frankel
General Transit Feed Specification
”The General Transit Feed Specification (GTFS) […] defines a
common format for public transportation schedules and
associated geographic information. GTFS feeds let public
transit agencies publish their transit data and developers write
applications that consume that data in an interoperable way.”
@nicolas_frankel
GTFS static model
Filename Required Defines
agency.txt Required Transit agencies with service represented in this dataset.
stops.txt Required
Stops where vehicles pick up or drop off riders. Also defines stations and station
entrances.
routes.txt Required Transit routes. A route is a group of trips that are displayed to riders as a single service.
trips.txt Required
Trips for each route. A trip is a sequence of two or more stops that occur during a
specific time period.
stop_times.txt Required Times that a vehicle arrives at and departs from stops for each trip.
calendar.txt Conditionally required
Service dates specified using a weekly schedule with start and end dates. This file is
required unless all dates of service are defined in calendar_dates.txt.
calendar_dates.txt Conditionally required
Exceptions for the services defined in the calendar.txt. If calendar.txt is omitted, then
calendar_dates.txt is required and must contain all dates of service.
fare_attributes.txt Optional Fare information for a transit agency's routes.
@nicolas_frankel
GTFS static model
Filename Required Defines
fare_rules.txt Optional Rules to apply fares for itineraries.
shapes.txt Optional Rules for mapping vehicle travel paths, sometimes referred to as route alignments.
frequencies.txt Optional
Headway (time between trips) for headway-based service or a compressed representation of fixed-schedule
service.
transfers.txt Optional Rules for making connections at transfer points between routes.
pathways.txt Optional Pathways linking together locations within stations.
levels.txt Optional Levels within stations.
feed_info.txt Optional Dataset metadata, including publisher, version, and expiration information.
translations.txt Optional Translated information of a transit agency.
attributions.txt Optional Specifies the attributions that are applied to the dataset.
@nicolas_frankel
GTFS dynamic model
@nicolas_frankel
Use-case: Swiss Public Transport
 Open Data
 GTFS static available as downloadable
.txt files
 GTFS dynamic available as a REST
endpoint
@nicolas_frankel
The available… … data model
Where’s the position?!
@nicolas_frankel
The dynamic data pipeline
1. Source: web service
2. Split into trip updates
3. Enrich with trip data
4. Enrich with stop times data
5. Transform hours into timestamp
6. Enrich with location data
7. Sink: Hazelcast IMDG
@nicolas_frankel
Architecture overview
@nicolas_frankel
@nicolas_frankel
Recap
 Streaming has a lot of benefits
 Leverage Open Data
 It’s the Wild West out there
• No standards
• Real-world data sucks!
 But you can get cool stuff done
@nicolas_frankel
Thanks a lot!
 https://blog.frankel.ch/
 @nicolas_frankel
 https://jet-start.sh/
 https://bit.ly/gtransportfs
 https://bit.ly/jet-train

Mais conteúdo relacionado

Mais procurados

Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...confluent
 
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...HostedbyConfluent
 
Confluent Cloud for Apache Kafka® | Google Cloud Next ’19
Confluent Cloud for Apache Kafka® | Google Cloud Next ’19Confluent Cloud for Apache Kafka® | Google Cloud Next ’19
Confluent Cloud for Apache Kafka® | Google Cloud Next ’19confluent
 
Pipelining the Heroes with Kafka and Graph
Pipelining the Heroes with Kafka and GraphPipelining the Heroes with Kafka and Graph
Pipelining the Heroes with Kafka and Graphconfluent
 
Apache Kafka in the Airline, Aviation and Travel Industry
Apache Kafka in the Airline, Aviation and Travel IndustryApache Kafka in the Airline, Aviation and Travel Industry
Apache Kafka in the Airline, Aviation and Travel IndustryKai Wähner
 
Serverless Kafka on AWS as Part of a Cloud-native Data Lake Architecture
Serverless Kafka on AWS as Part of a Cloud-native Data Lake ArchitectureServerless Kafka on AWS as Part of a Cloud-native Data Lake Architecture
Serverless Kafka on AWS as Part of a Cloud-native Data Lake ArchitectureKai Wähner
 
Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...
Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...
Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...confluent
 
Apache Kafka in the Automotive Industry (Connected Vehicles, Manufacturing 4....
Apache Kafka in the Automotive Industry (Connected Vehicles, Manufacturing 4....Apache Kafka in the Automotive Industry (Connected Vehicles, Manufacturing 4....
Apache Kafka in the Automotive Industry (Connected Vehicles, Manufacturing 4....Kai Wähner
 
Ingesting IoT data in Food Processing
Ingesting IoT data in Food ProcessingIngesting IoT data in Food Processing
Ingesting IoT data in Food Processingconfluent
 
Connecting Apache Kafka to Cash
Connecting Apache Kafka to CashConnecting Apache Kafka to Cash
Connecting Apache Kafka to Cashconfluent
 
Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...
Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...
Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...Kai Wähner
 
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...Kai Wähner
 
Understanding the TCO and ROI of Apache Kafka & Confluent
Understanding the TCO and ROI of Apache Kafka & ConfluentUnderstanding the TCO and ROI of Apache Kafka & Confluent
Understanding the TCO and ROI of Apache Kafka & Confluentconfluent
 
IBM Cloud Pak for Integration with Confluent Platform powered by Apache Kafka
IBM Cloud Pak for Integration with Confluent Platform powered by Apache KafkaIBM Cloud Pak for Integration with Confluent Platform powered by Apache Kafka
IBM Cloud Pak for Integration with Confluent Platform powered by Apache KafkaKai Wähner
 
The Rise Of Event Streaming – Why Apache Kafka Changes Everything
The Rise Of Event Streaming – Why Apache Kafka Changes EverythingThe Rise Of Event Streaming – Why Apache Kafka Changes Everything
The Rise Of Event Streaming – Why Apache Kafka Changes EverythingKai Wähner
 
APAC Confluent Consumer Data Right the Lowdown and the Lessons
APAC Confluent Consumer Data Right the Lowdown and the LessonsAPAC Confluent Consumer Data Right the Lowdown and the Lessons
APAC Confluent Consumer Data Right the Lowdown and the Lessonsconfluent
 
Stream me to the Cloud (and back) with Confluent & MongoDB
Stream me to the Cloud (and back) with Confluent & MongoDBStream me to the Cloud (and back) with Confluent & MongoDB
Stream me to the Cloud (and back) with Confluent & MongoDBconfluent
 
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...confluent
 
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...Kai Wähner
 
Mainframe Integration, Offloading and Replacement with Apache Kafka
Mainframe Integration, Offloading and Replacement with Apache KafkaMainframe Integration, Offloading and Replacement with Apache Kafka
Mainframe Integration, Offloading and Replacement with Apache KafkaKai Wähner
 

Mais procurados (20)

Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
 
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
 
Confluent Cloud for Apache Kafka® | Google Cloud Next ’19
Confluent Cloud for Apache Kafka® | Google Cloud Next ’19Confluent Cloud for Apache Kafka® | Google Cloud Next ’19
Confluent Cloud for Apache Kafka® | Google Cloud Next ’19
 
Pipelining the Heroes with Kafka and Graph
Pipelining the Heroes with Kafka and GraphPipelining the Heroes with Kafka and Graph
Pipelining the Heroes with Kafka and Graph
 
Apache Kafka in the Airline, Aviation and Travel Industry
Apache Kafka in the Airline, Aviation and Travel IndustryApache Kafka in the Airline, Aviation and Travel Industry
Apache Kafka in the Airline, Aviation and Travel Industry
 
Serverless Kafka on AWS as Part of a Cloud-native Data Lake Architecture
Serverless Kafka on AWS as Part of a Cloud-native Data Lake ArchitectureServerless Kafka on AWS as Part of a Cloud-native Data Lake Architecture
Serverless Kafka on AWS as Part of a Cloud-native Data Lake Architecture
 
Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...
Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...
Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...
 
Apache Kafka in the Automotive Industry (Connected Vehicles, Manufacturing 4....
Apache Kafka in the Automotive Industry (Connected Vehicles, Manufacturing 4....Apache Kafka in the Automotive Industry (Connected Vehicles, Manufacturing 4....
Apache Kafka in the Automotive Industry (Connected Vehicles, Manufacturing 4....
 
Ingesting IoT data in Food Processing
Ingesting IoT data in Food ProcessingIngesting IoT data in Food Processing
Ingesting IoT data in Food Processing
 
Connecting Apache Kafka to Cash
Connecting Apache Kafka to CashConnecting Apache Kafka to Cash
Connecting Apache Kafka to Cash
 
Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...
Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...
Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...
 
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
 
Understanding the TCO and ROI of Apache Kafka & Confluent
Understanding the TCO and ROI of Apache Kafka & ConfluentUnderstanding the TCO and ROI of Apache Kafka & Confluent
Understanding the TCO and ROI of Apache Kafka & Confluent
 
IBM Cloud Pak for Integration with Confluent Platform powered by Apache Kafka
IBM Cloud Pak for Integration with Confluent Platform powered by Apache KafkaIBM Cloud Pak for Integration with Confluent Platform powered by Apache Kafka
IBM Cloud Pak for Integration with Confluent Platform powered by Apache Kafka
 
The Rise Of Event Streaming – Why Apache Kafka Changes Everything
The Rise Of Event Streaming – Why Apache Kafka Changes EverythingThe Rise Of Event Streaming – Why Apache Kafka Changes Everything
The Rise Of Event Streaming – Why Apache Kafka Changes Everything
 
APAC Confluent Consumer Data Right the Lowdown and the Lessons
APAC Confluent Consumer Data Right the Lowdown and the LessonsAPAC Confluent Consumer Data Right the Lowdown and the Lessons
APAC Confluent Consumer Data Right the Lowdown and the Lessons
 
Stream me to the Cloud (and back) with Confluent & MongoDB
Stream me to the Cloud (and back) with Confluent & MongoDBStream me to the Cloud (and back) with Confluent & MongoDB
Stream me to the Cloud (and back) with Confluent & MongoDB
 
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
 
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
 
Mainframe Integration, Offloading and Replacement with Apache Kafka
Mainframe Integration, Offloading and Replacement with Apache KafkaMainframe Integration, Offloading and Replacement with Apache Kafka
Mainframe Integration, Offloading and Replacement with Apache Kafka
 

Semelhante a JUG Tirana - Introduction to data streaming

SCALE - Stream processing and Open Data, a match made in Heaven
SCALE - Stream processing and Open Data, a match made in HeavenSCALE - Stream processing and Open Data, a match made in Heaven
SCALE - Stream processing and Open Data, a match made in HeavenNicolas Fränkel
 
BruJUG - Introduction to data streaming
BruJUG - Introduction to data streamingBruJUG - Introduction to data streaming
BruJUG - Introduction to data streamingNicolas Fränkel
 
WaJUG - Introduction to data streaming
WaJUG - Introduction to data streamingWaJUG - Introduction to data streaming
WaJUG - Introduction to data streamingNicolas Fränkel
 
JUG SF - Introduction to data streaming
JUG SF - Introduction to data streamingJUG SF - Introduction to data streaming
JUG SF - Introduction to data streamingNicolas Fränkel
 
BigData conference - Introduction to stream processing
BigData conference - Introduction to stream processingBigData conference - Introduction to stream processing
BigData conference - Introduction to stream processingNicolas Fränkel
 
Devclub.lv - Introduction to stream processing
Devclub.lv - Introduction to stream processingDevclub.lv - Introduction to stream processing
Devclub.lv - Introduction to stream processingNicolas Fränkel
 
Big data serving: Processing and inference at scale in real time
Big data serving: Processing and inference at scale in real timeBig data serving: Processing and inference at scale in real time
Big data serving: Processing and inference at scale in real timeItai Yaffe
 
Apache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataApache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataDataWorks Summit/Hadoop Summit
 
The End of a Myth: Ultra-Scalable Transactional Management
The End of a Myth: Ultra-Scalable Transactional ManagementThe End of a Myth: Ultra-Scalable Transactional Management
The End of a Myth: Ultra-Scalable Transactional ManagementRicardo Jimenez-Peris
 
Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Apache Apex
 
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...Dataconomy Media
 
Cooperative Task Execution for Apache Spark
Cooperative Task Execution for Apache SparkCooperative Task Execution for Apache Spark
Cooperative Task Execution for Apache SparkDatabricks
 
[DSC Europe 23] Pramod Immaneni - Real-time analytics at IoT scale
[DSC Europe 23] Pramod Immaneni - Real-time analytics at IoT scale[DSC Europe 23] Pramod Immaneni - Real-time analytics at IoT scale
[DSC Europe 23] Pramod Immaneni - Real-time analytics at IoT scaleDataScienceConferenc1
 
Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks Jim Dowling
 
Integration Patterns for Big Data Applications
Integration Patterns for Big Data ApplicationsIntegration Patterns for Big Data Applications
Integration Patterns for Big Data ApplicationsMichael Häusler
 
FlinkDTW: Time-series Pattern Search at Scale Using Dynamic Time Warping - Ch...
FlinkDTW: Time-series Pattern Search at Scale Using Dynamic Time Warping - Ch...FlinkDTW: Time-series Pattern Search at Scale Using Dynamic Time Warping - Ch...
FlinkDTW: Time-series Pattern Search at Scale Using Dynamic Time Warping - Ch...Flink Forward
 
Analysing data analytics use cases to understand big data platform
Analysing data analytics use cases  to understand big data platformAnalysing data analytics use cases  to understand big data platform
Analysing data analytics use cases to understand big data platformdataeaze systems
 
Analysing data analytics use cases to understand big data platform
Analysing data analytics use cases  to understand big data platformAnalysing data analytics use cases  to understand big data platform
Analysing data analytics use cases to understand big data platformdataeaze systems
 

Semelhante a JUG Tirana - Introduction to data streaming (20)

SCALE - Stream processing and Open Data, a match made in Heaven
SCALE - Stream processing and Open Data, a match made in HeavenSCALE - Stream processing and Open Data, a match made in Heaven
SCALE - Stream processing and Open Data, a match made in Heaven
 
BruJUG - Introduction to data streaming
BruJUG - Introduction to data streamingBruJUG - Introduction to data streaming
BruJUG - Introduction to data streaming
 
WaJUG - Introduction to data streaming
WaJUG - Introduction to data streamingWaJUG - Introduction to data streaming
WaJUG - Introduction to data streaming
 
JUG SF - Introduction to data streaming
JUG SF - Introduction to data streamingJUG SF - Introduction to data streaming
JUG SF - Introduction to data streaming
 
BigData conference - Introduction to stream processing
BigData conference - Introduction to stream processingBigData conference - Introduction to stream processing
BigData conference - Introduction to stream processing
 
Devclub.lv - Introduction to stream processing
Devclub.lv - Introduction to stream processingDevclub.lv - Introduction to stream processing
Devclub.lv - Introduction to stream processing
 
Big data serving: Processing and inference at scale in real time
Big data serving: Processing and inference at scale in real timeBig data serving: Processing and inference at scale in real time
Big data serving: Processing and inference at scale in real time
 
Apache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataApache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing data
 
The End of a Myth: Ultra-Scalable Transactional Management
The End of a Myth: Ultra-Scalable Transactional ManagementThe End of a Myth: Ultra-Scalable Transactional Management
The End of a Myth: Ultra-Scalable Transactional Management
 
Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex
 
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
 
Cooperative Task Execution for Apache Spark
Cooperative Task Execution for Apache SparkCooperative Task Execution for Apache Spark
Cooperative Task Execution for Apache Spark
 
[DSC Europe 23] Pramod Immaneni - Real-time analytics at IoT scale
[DSC Europe 23] Pramod Immaneni - Real-time analytics at IoT scale[DSC Europe 23] Pramod Immaneni - Real-time analytics at IoT scale
[DSC Europe 23] Pramod Immaneni - Real-time analytics at IoT scale
 
Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks
 
Integration Patterns for Big Data Applications
Integration Patterns for Big Data ApplicationsIntegration Patterns for Big Data Applications
Integration Patterns for Big Data Applications
 
Enterprise Data Lakes
Enterprise Data LakesEnterprise Data Lakes
Enterprise Data Lakes
 
Stream Processing Overview
Stream Processing OverviewStream Processing Overview
Stream Processing Overview
 
FlinkDTW: Time-series Pattern Search at Scale Using Dynamic Time Warping - Ch...
FlinkDTW: Time-series Pattern Search at Scale Using Dynamic Time Warping - Ch...FlinkDTW: Time-series Pattern Search at Scale Using Dynamic Time Warping - Ch...
FlinkDTW: Time-series Pattern Search at Scale Using Dynamic Time Warping - Ch...
 
Analysing data analytics use cases to understand big data platform
Analysing data analytics use cases  to understand big data platformAnalysing data analytics use cases  to understand big data platform
Analysing data analytics use cases to understand big data platform
 
Analysing data analytics use cases to understand big data platform
Analysing data analytics use cases  to understand big data platformAnalysing data analytics use cases  to understand big data platform
Analysing data analytics use cases to understand big data platform
 

Mais de Nicolas Fränkel

SnowCamp - Adding search to a legacy application
SnowCamp - Adding search to a legacy applicationSnowCamp - Adding search to a legacy application
SnowCamp - Adding search to a legacy applicationNicolas Fränkel
 
Un CV de dévelopeur toujours a jour
Un CV de dévelopeur toujours a jourUn CV de dévelopeur toujours a jour
Un CV de dévelopeur toujours a jourNicolas Fränkel
 
Zero-downtime deployment on Kubernetes with Hazelcast
Zero-downtime deployment on Kubernetes with HazelcastZero-downtime deployment on Kubernetes with Hazelcast
Zero-downtime deployment on Kubernetes with HazelcastNicolas Fränkel
 
jLove - A Change-Data-Capture use-case: designing an evergreen cache
jLove - A Change-Data-Capture use-case: designing an evergreen cachejLove - A Change-Data-Capture use-case: designing an evergreen cache
jLove - A Change-Data-Capture use-case: designing an evergreen cacheNicolas Fränkel
 
ADDO - Your own Kubernetes controller, not only in Go
ADDO - Your own Kubernetes controller, not only in GoADDO - Your own Kubernetes controller, not only in Go
ADDO - Your own Kubernetes controller, not only in GoNicolas Fränkel
 
TestCon Europe - Mutation Testing to the Rescue of Your Tests
TestCon Europe - Mutation Testing to the Rescue of Your TestsTestCon Europe - Mutation Testing to the Rescue of Your Tests
TestCon Europe - Mutation Testing to the Rescue of Your TestsNicolas Fränkel
 
OSCONF Jaipur - A Hitchhiker's Tour to Containerizing a Java application
OSCONF Jaipur - A Hitchhiker's Tour to Containerizing a Java applicationOSCONF Jaipur - A Hitchhiker's Tour to Containerizing a Java application
OSCONF Jaipur - A Hitchhiker's Tour to Containerizing a Java applicationNicolas Fränkel
 
GeekcampSG 2020 - A Change-Data-Capture use-case: designing an evergreen cache
GeekcampSG 2020 - A Change-Data-Capture use-case: designing an evergreen cacheGeekcampSG 2020 - A Change-Data-Capture use-case: designing an evergreen cache
GeekcampSG 2020 - A Change-Data-Capture use-case: designing an evergreen cacheNicolas Fränkel
 
JavaDay Istanbul - 3 improvements in your microservices architecture
JavaDay Istanbul - 3 improvements in your microservices architectureJavaDay Istanbul - 3 improvements in your microservices architecture
JavaDay Istanbul - 3 improvements in your microservices architecture Nicolas Fränkel
 
OSCONF Hyderabad - Shorten all URLs!
OSCONF Hyderabad - Shorten all URLs!OSCONF Hyderabad - Shorten all URLs!
OSCONF Hyderabad - Shorten all URLs!Nicolas Fränkel
 
OSCONF Koshi - Zero downtime deployment with Kubernetes, Flyway and Spring Boot
OSCONF Koshi - Zero downtime deployment with Kubernetes, Flyway and Spring BootOSCONF Koshi - Zero downtime deployment with Kubernetes, Flyway and Spring Boot
OSCONF Koshi - Zero downtime deployment with Kubernetes, Flyway and Spring BootNicolas Fränkel
 
JOnConf - A CDC use-case: designing an Evergreen Cache
JOnConf - A CDC use-case: designing an Evergreen CacheJOnConf - A CDC use-case: designing an Evergreen Cache
JOnConf - A CDC use-case: designing an Evergreen CacheNicolas Fränkel
 
London In-Memory Computing Meetup - A Change-Data-Capture use-case: designing...
London In-Memory Computing Meetup - A Change-Data-Capture use-case: designing...London In-Memory Computing Meetup - A Change-Data-Capture use-case: designing...
London In-Memory Computing Meetup - A Change-Data-Capture use-case: designing...Nicolas Fränkel
 
Java.IL - Your own Kubernetes controller, not only in Go!
Java.IL - Your own Kubernetes controller, not only in Go!Java.IL - Your own Kubernetes controller, not only in Go!
Java.IL - Your own Kubernetes controller, not only in Go!Nicolas Fränkel
 
vJUG - Introduction to data streaming
vJUG - Introduction to data streamingvJUG - Introduction to data streaming
vJUG - Introduction to data streamingNicolas Fränkel
 
London Java Community - An Experiment in Continuous Deployment of JVM applica...
London Java Community - An Experiment in Continuous Deployment of JVM applica...London Java Community - An Experiment in Continuous Deployment of JVM applica...
London Java Community - An Experiment in Continuous Deployment of JVM applica...Nicolas Fränkel
 
OSCONF - Your own Kubernetes controller: not only in Go
OSCONF - Your own Kubernetes controller: not only in GoOSCONF - Your own Kubernetes controller: not only in Go
OSCONF - Your own Kubernetes controller: not only in GoNicolas Fränkel
 
vKUG - Migrating Spring Boot apps from annotation-based config to Functional
vKUG - Migrating Spring Boot apps from annotation-based config to FunctionalvKUG - Migrating Spring Boot apps from annotation-based config to Functional
vKUG - Migrating Spring Boot apps from annotation-based config to FunctionalNicolas Fränkel
 
Tech talks - 3 performance improvements
Tech talks - 3 performance improvementsTech talks - 3 performance improvements
Tech talks - 3 performance improvementsNicolas Fränkel
 
AllTheTalks.online - A Streaming Use-Case: And Experiment in Continuous Deplo...
AllTheTalks.online - A Streaming Use-Case: And Experiment in Continuous Deplo...AllTheTalks.online - A Streaming Use-Case: And Experiment in Continuous Deplo...
AllTheTalks.online - A Streaming Use-Case: And Experiment in Continuous Deplo...Nicolas Fränkel
 

Mais de Nicolas Fränkel (20)

SnowCamp - Adding search to a legacy application
SnowCamp - Adding search to a legacy applicationSnowCamp - Adding search to a legacy application
SnowCamp - Adding search to a legacy application
 
Un CV de dévelopeur toujours a jour
Un CV de dévelopeur toujours a jourUn CV de dévelopeur toujours a jour
Un CV de dévelopeur toujours a jour
 
Zero-downtime deployment on Kubernetes with Hazelcast
Zero-downtime deployment on Kubernetes with HazelcastZero-downtime deployment on Kubernetes with Hazelcast
Zero-downtime deployment on Kubernetes with Hazelcast
 
jLove - A Change-Data-Capture use-case: designing an evergreen cache
jLove - A Change-Data-Capture use-case: designing an evergreen cachejLove - A Change-Data-Capture use-case: designing an evergreen cache
jLove - A Change-Data-Capture use-case: designing an evergreen cache
 
ADDO - Your own Kubernetes controller, not only in Go
ADDO - Your own Kubernetes controller, not only in GoADDO - Your own Kubernetes controller, not only in Go
ADDO - Your own Kubernetes controller, not only in Go
 
TestCon Europe - Mutation Testing to the Rescue of Your Tests
TestCon Europe - Mutation Testing to the Rescue of Your TestsTestCon Europe - Mutation Testing to the Rescue of Your Tests
TestCon Europe - Mutation Testing to the Rescue of Your Tests
 
OSCONF Jaipur - A Hitchhiker's Tour to Containerizing a Java application
OSCONF Jaipur - A Hitchhiker's Tour to Containerizing a Java applicationOSCONF Jaipur - A Hitchhiker's Tour to Containerizing a Java application
OSCONF Jaipur - A Hitchhiker's Tour to Containerizing a Java application
 
GeekcampSG 2020 - A Change-Data-Capture use-case: designing an evergreen cache
GeekcampSG 2020 - A Change-Data-Capture use-case: designing an evergreen cacheGeekcampSG 2020 - A Change-Data-Capture use-case: designing an evergreen cache
GeekcampSG 2020 - A Change-Data-Capture use-case: designing an evergreen cache
 
JavaDay Istanbul - 3 improvements in your microservices architecture
JavaDay Istanbul - 3 improvements in your microservices architectureJavaDay Istanbul - 3 improvements in your microservices architecture
JavaDay Istanbul - 3 improvements in your microservices architecture
 
OSCONF Hyderabad - Shorten all URLs!
OSCONF Hyderabad - Shorten all URLs!OSCONF Hyderabad - Shorten all URLs!
OSCONF Hyderabad - Shorten all URLs!
 
OSCONF Koshi - Zero downtime deployment with Kubernetes, Flyway and Spring Boot
OSCONF Koshi - Zero downtime deployment with Kubernetes, Flyway and Spring BootOSCONF Koshi - Zero downtime deployment with Kubernetes, Flyway and Spring Boot
OSCONF Koshi - Zero downtime deployment with Kubernetes, Flyway and Spring Boot
 
JOnConf - A CDC use-case: designing an Evergreen Cache
JOnConf - A CDC use-case: designing an Evergreen CacheJOnConf - A CDC use-case: designing an Evergreen Cache
JOnConf - A CDC use-case: designing an Evergreen Cache
 
London In-Memory Computing Meetup - A Change-Data-Capture use-case: designing...
London In-Memory Computing Meetup - A Change-Data-Capture use-case: designing...London In-Memory Computing Meetup - A Change-Data-Capture use-case: designing...
London In-Memory Computing Meetup - A Change-Data-Capture use-case: designing...
 
Java.IL - Your own Kubernetes controller, not only in Go!
Java.IL - Your own Kubernetes controller, not only in Go!Java.IL - Your own Kubernetes controller, not only in Go!
Java.IL - Your own Kubernetes controller, not only in Go!
 
vJUG - Introduction to data streaming
vJUG - Introduction to data streamingvJUG - Introduction to data streaming
vJUG - Introduction to data streaming
 
London Java Community - An Experiment in Continuous Deployment of JVM applica...
London Java Community - An Experiment in Continuous Deployment of JVM applica...London Java Community - An Experiment in Continuous Deployment of JVM applica...
London Java Community - An Experiment in Continuous Deployment of JVM applica...
 
OSCONF - Your own Kubernetes controller: not only in Go
OSCONF - Your own Kubernetes controller: not only in GoOSCONF - Your own Kubernetes controller: not only in Go
OSCONF - Your own Kubernetes controller: not only in Go
 
vKUG - Migrating Spring Boot apps from annotation-based config to Functional
vKUG - Migrating Spring Boot apps from annotation-based config to FunctionalvKUG - Migrating Spring Boot apps from annotation-based config to Functional
vKUG - Migrating Spring Boot apps from annotation-based config to Functional
 
Tech talks - 3 performance improvements
Tech talks - 3 performance improvementsTech talks - 3 performance improvements
Tech talks - 3 performance improvements
 
AllTheTalks.online - A Streaming Use-Case: And Experiment in Continuous Deplo...
AllTheTalks.online - A Streaming Use-Case: And Experiment in Continuous Deplo...AllTheTalks.online - A Streaming Use-Case: And Experiment in Continuous Deplo...
AllTheTalks.online - A Streaming Use-Case: And Experiment in Continuous Deplo...
 

Último

Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 

Último (20)

Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 

JUG Tirana - Introduction to data streaming

Notas do Editor

  1. Real-time (latency-sensitive) operations combined with analytics Count usages per CC in last 10 secs, fraud if > 10 Real-time querying Based on analytics, prediction Fraud detection ran overnight has low value Complex event processing Pattern detection (if A and B -> C) SPE runs this at scale Valuable: IOT support. Machine analytics/predictions - fits into AI Without streaming?
  2. SP - make use of multi-processor multi-node runtime, Minimize costs: data shuffling, context switching Jet Does Distributed Parallel Processing 1/ Execution plan 2/ Execute it in parallel How can be computation parallelized? Task parallelism - make use of multiprocessor machines Continuous - run in parallel and exchange data MR - just two steps
  3. SP - make use of multi-processor multi-node runtime, Minimize costs: data shuffling, context switching 1/ Data parallelism - distribute data partitions among available resources DAG deployed to all cluster members = more cores What can be parallelized? Source - if partitioned, can read in parallel Map/filter - can run in parallel Extend the edges Shuffing / moving data is expensive Keeps data local, even reads it locally when co-located with the data source
  4. @startuml class FeedMessage class FeedHeader { gtfs_realtime_version: string timestamp: uint64 } enum Incrementality { FULL_DATASET DIFFERENTIAL } class FeedEntity { id: String is_deleted: boolean } class TripUpdate { timestamp: uint64 delay: int32 } class VehiclePosition { current_stop_sequence: uint32 stop_id: string timestamp: uint64 } enum VehicleStopStatus { INCOMING_AT STOPPED_AT IN_TRANSIT_TO } enum CongestionLevel { UNKNOWN_CONGESTION_LEVEL RUNNING_SMOOTHLY STOP_AND_GO CONGESTION SEVERE_CONGESTION } class Alert enum Cause { UNKNOWN_CAUSE OTHER_CAUSE TECHNICAL_PROBLEM STRIKE DEMONSTRATION ACCIDENT HOLIDAY WEATHER MAINTENANCE CONSTRUCTION POLICE_ACTIVITY MEDICAL_EMERGENCY } enum Effect { NO_SERVICE REDUCED_SERVICE SIGNIFICANT_DELAYS DETOUR ADDITIONAL_SERVICE MODIFIED_SERVICE OTHER_EFFECT UNKNOWN_EFFECT STOP_MOVED } class TimeRange { start: uint64 end: uint64 } class Position { latitude: float longitude: float bearing: float odometer: double speed: float } class TripDescriptor { trip_id: String route_id: String direction_id: uint32 start_time: string start_date: string } class VehicleDescriptor { id: string label: string license_plate: string } class StopTimeUpdate { stop_sequence: uint32 stop_id: string } class StopTimeEvent { delay: uint32 time: int64 uncertainty: int32 } enum ScheduleRelationship { SCHEDULED SKIPPED NO_DATA } class TripDescriptor { trip_id: string route_id: string direction_id: uint32 start_time: string start_date: string } enum ScheduleRelationship2 as "ScheduleRelationship" { SCHEDULED ADDED UNSCHEDULED CANCELED } class EntitySelector { agency_id: string route_id: string route_type: int32 stop_id: string } class Translation { text: string language: string } FeedMessage -up-> "1" FeedHeader: header FeedMessage -down-> "*" FeedEntity: entity FeedHeader -right-> "1" Incrementality FeedEntity --> "0..1" TripUpdate FeedEntity -left-> "0..1" VehiclePosition FeedEntity -right-> "0..1" Alert TripUpdate --> "1" TripDescriptor: trip TripUpdate -left-> "0..1" VehicleDescriptor: vehicle TripUpdate --> "*" StopTimeUpdate StopTimeUpdate -left-> "0..1" StopTimeEvent: arrival StopTimeUpdate -left-> "0..1" StopTimeEvent: departure StopTimeUpdate --> "0..1" ScheduleRelationship TripDescriptor -right-> "0..1" ScheduleRelationship2 VehiclePosition --> "0..1" TripDescriptor: trip VehiclePosition --> "0..1" VehicleDescriptor: vehicle VehiclePosition -left-> "0..1" Position: vehicle VehiclePosition -up-> "0..1" VehicleStopStatus: current_status VehiclePosition -up-> "0..1" CongestionLevel Alert --> "*" TimeRange: active_period Alert --> "1..*" EntitySelector: informed_entity Alert -up-> "0..1" Cause Alert -up-> "0..1" Effect Alert -right-> "0..1" TranslatedString: url Alert -right-> "1" TranslatedString: header_text Alert -right-> "1" TranslatedString: description_text EntitySelector --> "0..1" TripDescriptor: trip TranslatedString --> "1..*" Translation note left of FeedMessage: Root message hide empty members @enduml
  5. node "Hazelcast Jet" as jet { database "Hazelcast IMDG" as imdg artifact "Load reference data Job" as staticjob artifact "Load dynamic data Job" as dynamicjob folder "Reference data files" as refdata { file trips.txt file routes.txt } } component "Reference data loader" <<Loader>> as staticloader component "Dynamic data loader" <<Loader>> as dynamicloader component "Web application" <<Spring Boot>> as webapp cloud { interface "Open Data endpoint" as ws } staticloader --> staticjob: Send job staticjob --> refdata: Read files staticjob --> imdg: Store JSON dynamicloader --> dynamicjob: Send job dynamicjob -right-> ws: Call REST endpoint dynamicjob --> imdg: Store JSON webapp -left-> imdg: Register to changes