Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Big Data Analytics Platforms by KTH and RISE SICS
1. Seif Haridi KTH/RISE
AI @ RISE
Hopsworks, Apache Flink and Beyond
Big Data Analytics Platforms
By KTH and RISE SICS
2.
3. Hopsworks: End2End Data Platform for Analytics/ML
Datasources
Applications
API
Dashboards
Hopsworks
Apache Beam
Apache Spark Pip
Conda
Tensorflow
scikit-learn
PyTorch
J upyter
Notebooks
Tensorboard
Apache Beam
Apache Spark
Apache Flink
Kubernetes
Batch Distributed
ML &DL
Model
Serving
Hopsworks
Feature Store
Kafka +
Spark
Streaming
Model
Monitoring
Orchestration in Airflow
Data Preparation
&Ingestion
Experimentation
&Model Training
Deploy
&Productionalize
Streaming
Filesystem and Metadata storage
HopsFS
Apache
Kafka
Datasources
4. Logical Clocks was founded by
the team that created and
continues to drive
Hopsworks a Data-Intensive AI
platform, and its Feature Store,
a warehouse for machine
learning features.
Logical Clocks’ vision is to
simplify the process of
refining data into intelligence
at scale
5. 25
Continuous Intelligence
A design pattern in which real-time analytics are integrated within a business operation,
processing current and historical data to prescribe actions in response to events.
Business
Tech
https://www.gartner.com/en/newsroom/press-releases/2019-02-18-gartner-identifies-top-10-data-and-analytics-technolo
events actions
6. Paradigm Shift in Data Processing
Data
lots of
Queries
retrospective
answers
Query
lots of
Data
real-time
answers
• Data Stream Processing as a 24/7 execution paradigm
paradigm
shift
6
Stream SQL, CEP…
Kafka, Pub/Sub, Kinesis,
Pravega…
Flink, Beam, Kafka-Streams,
Apex, Storm, Spark
Streaming…
Storage
Compute
High Level
Models
The Real-Time Analytics Stack
7. Actors vs Streams
vs
Data Stream ComputingActor Programming
• Declarative Programming
• State Managed by the system
• Robust: Built-in Fault Tolerance
• Scalable Deployments
service
logic
service
logic
state log
ic
log
ic
log
ic
log
ic
log
iclogic logic
logic
log
ic
logic
state
• Low-Level Event-Based Programming
• Manual/External State
• Not Robust: Manual Fault Tolerance
• Not flexible scaling
Declarative
Program
service
19. 19
Flink as an Anomaly-Detection Engine
for the Cloud (2018)
• Activity-Based Threat Protection
• Behavioural model/per cloud user
• Detect outliers/suspicious behavior
• Cross-reference suspicious users
• Alert Admins within seconds
We needed a stateful and scalable stream processing framework. We tested everything (Azure ML/Streams,
MS Orlieans, Apache Storm/Samza/Spark/Ignite/Beam etc.) and chose Flink. - Yonatan Most & Avihai
Berkovitz -https://www.slideshare.net/FlinkForward/flink-forward-berlin-2018-yonatan-most-avihai-berkovitz-anomaly-detection-engine-for-cloud-activities-using-flink
8 data clusters. many TB of state
30k events per second
20. 20
Data Streaming at Mass Scale
https://data-artisans.com/blog/blink-flink-alibaba-search
• Biggest Retailer in the world.
• Entire Product Search, A/B Testing, User
Recommendations and Analytics Services are powered by
Blink (fork of Flink).
• 1000s of nodes actively in production.
21. Continuous Deep Analytics CDA
knowledge
PROCESSING
∞
Data
REASONING
Decision
Making
The goal of the CDA
• Create a Big Data platform
that can leverage complex
real-time decisions based on
massive live data.
22. Real-Time and Deep Analytics
for Central & Edge Clouds
Our promise and vision
From Real-Time Analytics to
Continuous Deep Analytics
X
Query
live
data
real-time
answers
Deep
Analytics
Historic
Model
historic
data
CDA
system
all
data
critical
decision
making
Live
Model
online
offline
The Continuous Deep Analytics Paradigm Shift
23. ?
?
?
?
The Bigger Picture
24
Data
Processing
• scalable, fault tolerant analytics
• event-based business logic
• out-of-order computation
• dynamic relational tables (SQL)
• event pattern-matching (CEP)
Data Streams
• tensors
• graph algorithms
• deep learning
• feature learning
• reinforcement learning
• ….
but what about deeper analytics…
24. Data Pipelines Today
•Many Frameworks/Frontends for different needs
•(ML Training & Serving, SQL, Streams, Tensors, Graphs)
25 ⋈
⋈
⋈
σθ
σθ
σθ
σθ
π
π
Streams
Feature Learning
Tensor Programming Dynamic
Graphs
AI ML
RL
Simulation tasks
Reasoning
Feature Engineering
Model Serving
26. The Problem & Solution
Problem
Data analytics pipelines build on diverse programming models
with hard abstraction boundaries
Performance deteriorates from context switching, steep data
movement costs and excessive type conversions
Solution
A solution is to raise the level of abstraction through an
intermediate representation (IR). The IR is a programming
language that is able to both express and reason about each of the
programming models.
31. 32
Arcon
Arc (High Level IR)
Logical Dataflow IR
Arcon runner
Hardware
Arcon Compiler Pipeline
Dataflow optimizations
Compiler optimizations
Cross-domain optimizations
Rust based runner
Hardware accelerated
Dynamic task execution
CPU/GPU/FPGA
Local & distributed
Dynamic scaling
Arc an IR for expressing and
optimizing computations that
combine stream, relations and
linear algebra
Arcon a general purpose
distributed runtime written in Rust
32. Arc IR
33
• A minimal yet feature-complete set of read/write-only types and expressions
38. Thanks
• To the CDA and HOPS teams and in general to the
distributed computing group at KTH and RISE SICS
• Please Visit
• DC@KTH https://dcatkth.github.io/
• HOPS https://www.hops.io/
• LogicalClocks https://www.logicalclocks.com/