SlideShare uma empresa Scribd logo
1 de 19
Anatomy of a Data Driven
Architecture
@tamir_dresher
System Architect @ Payoneer
2
System Architect @ @tamir_dresher
Tamir Dresher
My Books:
Software Engineering Lecturer
Ruppin Academic Center
tamirdr@payoneer.com
https://www.israelclouds.com/iasaisrael
3
The need for DATA
Your Data
Data-Driven Decision Making Data-powered product
• What markets are leading and where can I expand?
• What’s slowing my process?
• Is there a correlation between the time invested
in a sale and the income from the tenant?
• What products should I recommend to this user?
• Is this action fraudulent ?
• Should I suggest a discount to this user to
raise the chance for a purchase?
4
ETL vs. ELT vs. Streaming
Transform LoadExtract
Transactional DB (OLTP) Analytical DB (OLAP)
LoadExtract
Transactional DB (OLTP)
Analytical DB (OLAP) / Storage
Transform
Events
Event
Stream
Stream
Processor
Real-time
insights
5
Lambda (λ) & Kappa (ϰ) Architectures
Speed Layer
Batch Layer Serving Layer
Real-time
views
Batched
views
Query
Data
Lambda
Speed/Streaming
Layer
Serving
Layer
Speed
views
QueryData
Kappa
Processing
6
The Data Pipeline
Architecture
7Cross Cutting and Ops
The Data Pipeline Architecture
Data Sources
Ingestion
&
Transformation
Storage
Query
&
Processing
Consumption
(Visualization,
Alerting,
Insights)
Discovery/Catalog/
Metadata Management
Security Quality&Testing Observability
8
Data Sources
Ingestion
&
Transformation
Storage
Query
&
Processing
Consumption
Databases Files Events APIs
• CDC - change data
capture
• Object Storage
• Structured/
Unstructured
• Customer tracking
• WebHooks
• Domain Events
• External Services
• Enrichment
9
Data Sources
Ingestion
&
Transformation
Storage
Query
&
Processing
Consumption
Data Integration Platforms
• Connectors to sources and destinations
• E.g. fivtran, stitchdata, rivery
Data Sources
10
Data Sources
Ingestion
&
Transformation
Storage
Query
&
Processing
Consumption
Data Integration Platforms
• Connectors to sources and destinations
• E.g. fivtran, stitch
11
Data Sources
Ingestion
&
Transformation
Storage
Query
&
Processing
Consumption
Data Integration Platforms
• Connectors to sources and destinations
• E.g. fivtran, stitchdata, rivery
Data Sources
Customized batching/micro-batching
• Spark jobs, Data libraries (Pandas, boto), Hive
• Workflows – Airflow, Dagster, Luigi
12
Data Sources
Ingestion
&
Transformation
Storage
Query
&
Processing
Consumption
Data Integration Platforms
• Connectors to sources and destinations
• E.g. fivtran, stitchdata, rivery
Data Sources
Customized batching/micro-batching
• Spark jobs, Data libraries (panda, boto), Hive
• Workflows – Airflow, Dagster, Luigi
https://towardsdatascience.com/building-a-production-level-etl-pipeline-platform-using-apache-airflow-a4cf34203fbd
13
Data Sources
Ingestion
&
Transformation
Storage
Query
&
Processing
Consumption
Data Integration Platforms
• Connectors to sources and destinations
• E.g. fivtran, stitchdata, rivery
Data Sources
Customized batching/micro-batching
• Spark jobs, Data libraries (panda, boto), Hive
• Workflows – Airflow, Dagster, Luigi
Event streaming and processing
• Messaging - Kafka, Pulsar, Kinesis, Event Hub
• Processing – Spark, Flink, Samza, Kafka
Streams, Azure Stream Analytics
14
Data Sources
Ingestion
&
Transformation
Storage
Query
&
Processing
Consumption
Data Integration Platforms
• Connectors to sources and destinations
• E.g. fivtran, stitchdata, rivery
Data Sources
Customized batching/micro-batching
• Spark jobs, Data libraries (panda, boto), Hive
• Workflows – Airflow, Dagster, Luigi
Event streaming and processing
• Messaging - Kafka, Pulsar, Kinesis, Event Hub
• Processing – Spark, Flink, Samza, Kafka
Streams, Azure Stream Analytics
-- Continuously aggregating a stream into a table with a ksqlDB push query.
CREATE STREAM locationUpdatesStream ...;
CREATE TABLE locationsPerUser AS
SELECT username, COUNT(*)
FROM locationUpdatesStream
GROUP BY username
EMIT CHANGES;
// Continuously aggregating to table
KStream<String, String> locationUpdatesStream = ...;
KTable<String, Long> locationsPerUser
= locationUpdatesStream
.groupBy((k, v) -> v.username)
.count();
https://www.confluent.io/blog/kafka-streams-tables-part-1-event-streaming/
15
Data Sources
Ingestion
&
Transformation
Storage
Query
&
Processing
Consumption
Data Warehouse
• Structured format
• Designed to quickly generate
insights based on SQL like
queries
• Modern cloud base offerings –
Redshift, BigQuery, Snowflake,
Azure Synapse
Data Lake
• Structured and non-structured
data – CSV, Parquet, Images,
Audio
• Raw and historical data
• Designed to be used by data
scientists and create models by
various languages
16
Data Sources
Ingestion
&
Transformation
Storage
Query
&
Processing
Consumption
Predictive
• Generate model
• Data Science and ML libraries –
Pandas, Numpy, R, scikit etc
• The model is periodically refreshed
Storage
Retrospective (Historical)
• Deriving Intelligence based on
statistics
• Built-in engine (Data Warehouse)
OR
• Query Engines – Presto, Impala
Real Time Analytics
• Run analytical queries
over big volumes of data
with interactive latencies.
• Apache Pinot,
Clickhouse, Druid
Data Science Platforms
• Helps managing the workflows,
productization and operations
- SageMaker, Iguazio, DataBricks etc.
17
Data Sources
Ingestion
&
Transformation
Storage
Query
&
Processing
Consumption
Custom Apps
• Execute the model – how is the model
reachable?
• Translate user/system actions to
queries
• Visualize – custom (Plotly Dash,
Streamlit etc) or embedded (PowerBI,
Looker etc)
External Apps
• Augmented Analytics - External
services to generate insights and
explain them
(e.g. Anomaly Detection with Anodot/
CrunchMetrics/outlier.ai)
• Customizable Reports and Dashboard
(e.g. Looker, Tableu, PowerBI, Sisense)
18
Summary
Data
at Rest
Data in
Motion Event/Dat
a Stream
Workflow Engine
Stream
Processor
Real-time
insights
Analytical
Storage
/Lake
Query
Engine
Model
Engine
Model
Accessible API
Application
Data Sources Ingestion&Transformation Storage Query&Processing Consume
19
Data at
Rest
Data in
Motion Event/Data
Stream
Workflow Engine
Stream
Processor
Real-time
insights
Analytical
Storage
/Lake
Query
Engine
Model
Engine
Model
Accessible API
Application
Data Sources Ingestion&Transformation Storage Query&Processing Consume
Thank You!
@tamir_dresher

Mais conteúdo relacionado

Mais procurados

Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftAmazon Web Services
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream ProcessingGuido Schmutz
 
Apache Druid 101
Apache Druid 101Apache Druid 101
Apache Druid 101Data Con LA
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouseJames Serra
 
XStream: stream processing platform at facebook
XStream:  stream processing platform at facebookXStream:  stream processing platform at facebook
XStream: stream processing platform at facebookAniket Mokashi
 
Some Iceberg Basics for Beginners (CDP).pdf
Some Iceberg Basics for Beginners (CDP).pdfSome Iceberg Basics for Beginners (CDP).pdf
Some Iceberg Basics for Beginners (CDP).pdfMichael Kogan
 
Getting Started with Confluent Schema Registry
Getting Started with Confluent Schema RegistryGetting Started with Confluent Schema Registry
Getting Started with Confluent Schema Registryconfluent
 
Reshape Data Lake (as of 2020.07)
Reshape Data Lake (as of 2020.07)Reshape Data Lake (as of 2020.07)
Reshape Data Lake (as of 2020.07)Eric Sun
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake OverviewJames Serra
 
Apache Solr-Webinar
Apache Solr-WebinarApache Solr-Webinar
Apache Solr-WebinarEdureka!
 
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
[DSC Europe 22] Overview of the Databricks Platform - Petar ZecevicDataScienceConferenc1
 
Hyperspace for Delta Lake
Hyperspace for Delta LakeHyperspace for Delta Lake
Hyperspace for Delta LakeDatabricks
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsAlluxio, Inc.
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta LakeDatabricks
 
Data Source API in Spark
Data Source API in SparkData Source API in Spark
Data Source API in SparkDatabricks
 

Mais procurados (20)

Data engineering design patterns
Data engineering design patternsData engineering design patterns
Data engineering design patterns
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
Apache Druid 101
Apache Druid 101Apache Druid 101
Apache Druid 101
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
 
XStream: stream processing platform at facebook
XStream:  stream processing platform at facebookXStream:  stream processing platform at facebook
XStream: stream processing platform at facebook
 
Some Iceberg Basics for Beginners (CDP).pdf
Some Iceberg Basics for Beginners (CDP).pdfSome Iceberg Basics for Beginners (CDP).pdf
Some Iceberg Basics for Beginners (CDP).pdf
 
Getting Started with Confluent Schema Registry
Getting Started with Confluent Schema RegistryGetting Started with Confluent Schema Registry
Getting Started with Confluent Schema Registry
 
Reshape Data Lake (as of 2020.07)
Reshape Data Lake (as of 2020.07)Reshape Data Lake (as of 2020.07)
Reshape Data Lake (as of 2020.07)
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
 
Apache Solr-Webinar
Apache Solr-WebinarApache Solr-Webinar
Apache Solr-Webinar
 
Kafka: Internals
Kafka: InternalsKafka: Internals
Kafka: Internals
 
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
 
Hyperspace for Delta Lake
Hyperspace for Delta LakeHyperspace for Delta Lake
Hyperspace for Delta Lake
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
Snowflake Datawarehouse Architecturing
Snowflake Datawarehouse ArchitecturingSnowflake Datawarehouse Architecturing
Snowflake Datawarehouse Architecturing
 
Modern Data Pipelines
Modern Data PipelinesModern Data Pipelines
Modern Data Pipelines
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
 
An Introduction to Druid
An Introduction to DruidAn Introduction to Druid
An Introduction to Druid
 
Data Source API in Spark
Data Source API in SparkData Source API in Spark
Data Source API in Spark
 

Semelhante a Anatomy of a data driven architecture - Tamir Dresher

Real time analytics at uber @ strata data 2019
Real time analytics at uber @ strata data 2019Real time analytics at uber @ strata data 2019
Real time analytics at uber @ strata data 2019Zhenxiao Luo
 
Towards sql for streams
Towards sql for streamsTowards sql for streams
Towards sql for streamsRadu Tudoran
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkTimothy Spann
 
Powering Interactive BI Analytics with Presto and Delta Lake
Powering Interactive BI Analytics with Presto and Delta LakePowering Interactive BI Analytics with Presto and Delta Lake
Powering Interactive BI Analytics with Presto and Delta LakeDatabricks
 
[WSO2Con USA 2018] The Rise of Streaming SQL
[WSO2Con USA 2018] The Rise of Streaming SQL[WSO2Con USA 2018] The Rise of Streaming SQL
[WSO2Con USA 2018] The Rise of Streaming SQLWSO2
 
MongoDB World 2019: Streaming ETL on the Shoulders of Giants
MongoDB World 2019: Streaming ETL on the Shoulders of GiantsMongoDB World 2019: Streaming ETL on the Shoulders of Giants
MongoDB World 2019: Streaming ETL on the Shoulders of GiantsMongoDB
 
Realtime Analytics on AWS
Realtime Analytics on AWSRealtime Analytics on AWS
Realtime Analytics on AWSSungmin Kim
 
Spark and Couchbase: Augmenting the Operational Database with Spark
Spark and Couchbase: Augmenting the Operational Database with SparkSpark and Couchbase: Augmenting the Operational Database with Spark
Spark and Couchbase: Augmenting the Operational Database with SparkSpark Summit
 
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureOtimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureLuan Moreno Medeiros Maciel
 
Building Data Pipelines with Spark and StreamSets
Building Data Pipelines with Spark and StreamSetsBuilding Data Pipelines with Spark and StreamSets
Building Data Pipelines with Spark and StreamSetsPat Patterson
 
Introduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matterIntroduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matterPaolo Castagna
 
Self-Service Data Ingestion Using NiFi, StreamSets & Kafka
Self-Service Data Ingestion Using NiFi, StreamSets & KafkaSelf-Service Data Ingestion Using NiFi, StreamSets & Kafka
Self-Service Data Ingestion Using NiFi, StreamSets & KafkaGuido Schmutz
 
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Databricks
 
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS Amazon Web Services LATAM
 
xGem Data Stream Processing
xGem Data Stream ProcessingxGem Data Stream Processing
xGem Data Stream ProcessingJorge Hirtz
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseDataWorks Summit
 
BBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.comBBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.comCedric Vidal
 
PayPal merchant ecosystem using Apache Spark, Hive, Druid, and HBase
PayPal merchant ecosystem using Apache Spark, Hive, Druid, and HBase PayPal merchant ecosystem using Apache Spark, Hive, Druid, and HBase
PayPal merchant ecosystem using Apache Spark, Hive, Druid, and HBase DataWorks Summit
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming VisualizationGuido Schmutz
 

Semelhante a Anatomy of a data driven architecture - Tamir Dresher (20)

Real time analytics at uber @ strata data 2019
Real time analytics at uber @ strata data 2019Real time analytics at uber @ strata data 2019
Real time analytics at uber @ strata data 2019
 
Towards sql for streams
Towards sql for streamsTowards sql for streams
Towards sql for streams
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
 
Powering Interactive BI Analytics with Presto and Delta Lake
Powering Interactive BI Analytics with Presto and Delta LakePowering Interactive BI Analytics with Presto and Delta Lake
Powering Interactive BI Analytics with Presto and Delta Lake
 
[WSO2Con USA 2018] The Rise of Streaming SQL
[WSO2Con USA 2018] The Rise of Streaming SQL[WSO2Con USA 2018] The Rise of Streaming SQL
[WSO2Con USA 2018] The Rise of Streaming SQL
 
The Rise of Streaming SQL
The Rise of Streaming SQLThe Rise of Streaming SQL
The Rise of Streaming SQL
 
MongoDB World 2019: Streaming ETL on the Shoulders of Giants
MongoDB World 2019: Streaming ETL on the Shoulders of GiantsMongoDB World 2019: Streaming ETL on the Shoulders of Giants
MongoDB World 2019: Streaming ETL on the Shoulders of Giants
 
Realtime Analytics on AWS
Realtime Analytics on AWSRealtime Analytics on AWS
Realtime Analytics on AWS
 
Spark and Couchbase: Augmenting the Operational Database with Spark
Spark and Couchbase: Augmenting the Operational Database with SparkSpark and Couchbase: Augmenting the Operational Database with Spark
Spark and Couchbase: Augmenting the Operational Database with Spark
 
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureOtimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
 
Building Data Pipelines with Spark and StreamSets
Building Data Pipelines with Spark and StreamSetsBuilding Data Pipelines with Spark and StreamSets
Building Data Pipelines with Spark and StreamSets
 
Introduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matterIntroduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matter
 
Self-Service Data Ingestion Using NiFi, StreamSets & Kafka
Self-Service Data Ingestion Using NiFi, StreamSets & KafkaSelf-Service Data Ingestion Using NiFi, StreamSets & Kafka
Self-Service Data Ingestion Using NiFi, StreamSets & Kafka
 
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
 
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
 
xGem Data Stream Processing
xGem Data Stream ProcessingxGem Data Stream Processing
xGem Data Stream Processing
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
 
BBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.comBBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.com
 
PayPal merchant ecosystem using Apache Spark, Hive, Druid, and HBase
PayPal merchant ecosystem using Apache Spark, Hive, Druid, and HBase PayPal merchant ecosystem using Apache Spark, Hive, Druid, and HBase
PayPal merchant ecosystem using Apache Spark, Hive, Druid, and HBase
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
 

Mais de Tamir Dresher

NET Aspire - NET Conf IL 2024 - Tamir Dresher.pdf
NET Aspire - NET Conf IL 2024 - Tamir Dresher.pdfNET Aspire - NET Conf IL 2024 - Tamir Dresher.pdf
NET Aspire - NET Conf IL 2024 - Tamir Dresher.pdfTamir Dresher
 
Tamir Dresher - DotNet 7 What's new.pptx
Tamir Dresher - DotNet 7 What's new.pptxTamir Dresher - DotNet 7 What's new.pptx
Tamir Dresher - DotNet 7 What's new.pptxTamir Dresher
 
Tamir Dresher - What’s new in ASP.NET Core 6
Tamir Dresher - What’s new in ASP.NET Core 6Tamir Dresher - What’s new in ASP.NET Core 6
Tamir Dresher - What’s new in ASP.NET Core 6Tamir Dresher
 
Tamir Dresher - Async Streams in C#
Tamir Dresher - Async Streams in C#Tamir Dresher - Async Streams in C#
Tamir Dresher - Async Streams in C#Tamir Dresher
 
Tamir Dresher Clarizen adventures with the wild GC during the holiday season
Tamir Dresher   Clarizen adventures with the wild GC during the holiday seasonTamir Dresher   Clarizen adventures with the wild GC during the holiday season
Tamir Dresher Clarizen adventures with the wild GC during the holiday seasonTamir Dresher
 
Debugging tricks you wish you knew Tamir Dresher - Odessa 2019
Debugging tricks you wish you knew   Tamir Dresher - Odessa 2019Debugging tricks you wish you knew   Tamir Dresher - Odessa 2019
Debugging tricks you wish you knew Tamir Dresher - Odessa 2019Tamir Dresher
 
From zero to hero with the actor model - Tamir Dresher - Odessa 2019
From zero to hero with the actor model  - Tamir Dresher - Odessa 2019From zero to hero with the actor model  - Tamir Dresher - Odessa 2019
From zero to hero with the actor model - Tamir Dresher - Odessa 2019Tamir Dresher
 
Tamir Dresher - Demystifying the Core of .NET Core
Tamir Dresher  - Demystifying the Core of .NET CoreTamir Dresher  - Demystifying the Core of .NET Core
Tamir Dresher - Demystifying the Core of .NET CoreTamir Dresher
 
Breaking the monolith to microservice with Docker and Kubernetes (k8s)
Breaking the monolith to microservice with Docker and Kubernetes (k8s)Breaking the monolith to microservice with Docker and Kubernetes (k8s)
Breaking the monolith to microservice with Docker and Kubernetes (k8s)Tamir Dresher
 
.Net december 2017 updates - Tamir Dresher
.Net december 2017 updates - Tamir Dresher.Net december 2017 updates - Tamir Dresher
.Net december 2017 updates - Tamir DresherTamir Dresher
 
Testing time and concurrency Rx
Testing time and concurrency RxTesting time and concurrency Rx
Testing time and concurrency RxTamir Dresher
 
Building responsive application with Rx - confoo - tamir dresher
Building responsive application with Rx - confoo - tamir dresherBuilding responsive application with Rx - confoo - tamir dresher
Building responsive application with Rx - confoo - tamir dresherTamir Dresher
 
.NET Debugging tricks you wish you knew tamir dresher
.NET Debugging tricks you wish you knew   tamir dresher.NET Debugging tricks you wish you knew   tamir dresher
.NET Debugging tricks you wish you knew tamir dresherTamir Dresher
 
From Zero to the Actor Model (With Akka.Net) - CodeMash2017 - Tamir Dresher
From Zero to the Actor Model (With Akka.Net) - CodeMash2017 - Tamir DresherFrom Zero to the Actor Model (With Akka.Net) - CodeMash2017 - Tamir Dresher
From Zero to the Actor Model (With Akka.Net) - CodeMash2017 - Tamir DresherTamir Dresher
 
Building responsive applications with Rx - CodeMash2017 - Tamir Dresher
Building responsive applications with Rx  - CodeMash2017 - Tamir DresherBuilding responsive applications with Rx  - CodeMash2017 - Tamir Dresher
Building responsive applications with Rx - CodeMash2017 - Tamir DresherTamir Dresher
 
Debugging tricks you wish you knew - Tamir Dresher
Debugging tricks you wish you knew  - Tamir DresherDebugging tricks you wish you knew  - Tamir Dresher
Debugging tricks you wish you knew - Tamir DresherTamir Dresher
 
Rx 101 - Tamir Dresher - Copenhagen .NET User Group
Rx 101  - Tamir Dresher - Copenhagen .NET User GroupRx 101  - Tamir Dresher - Copenhagen .NET User Group
Rx 101 - Tamir Dresher - Copenhagen .NET User GroupTamir Dresher
 
Cloud patterns - NDC Oslo 2016 - Tamir Dresher
Cloud patterns - NDC Oslo 2016 - Tamir DresherCloud patterns - NDC Oslo 2016 - Tamir Dresher
Cloud patterns - NDC Oslo 2016 - Tamir DresherTamir Dresher
 
Reactiveness All The Way - SW Architecture 2015 Conference
Reactiveness All The Way - SW Architecture 2015 ConferenceReactiveness All The Way - SW Architecture 2015 Conference
Reactiveness All The Way - SW Architecture 2015 ConferenceTamir Dresher
 
Rx 101 Codemotion Milan 2015 - Tamir Dresher
Rx 101   Codemotion Milan 2015 - Tamir DresherRx 101   Codemotion Milan 2015 - Tamir Dresher
Rx 101 Codemotion Milan 2015 - Tamir DresherTamir Dresher
 

Mais de Tamir Dresher (20)

NET Aspire - NET Conf IL 2024 - Tamir Dresher.pdf
NET Aspire - NET Conf IL 2024 - Tamir Dresher.pdfNET Aspire - NET Conf IL 2024 - Tamir Dresher.pdf
NET Aspire - NET Conf IL 2024 - Tamir Dresher.pdf
 
Tamir Dresher - DotNet 7 What's new.pptx
Tamir Dresher - DotNet 7 What's new.pptxTamir Dresher - DotNet 7 What's new.pptx
Tamir Dresher - DotNet 7 What's new.pptx
 
Tamir Dresher - What’s new in ASP.NET Core 6
Tamir Dresher - What’s new in ASP.NET Core 6Tamir Dresher - What’s new in ASP.NET Core 6
Tamir Dresher - What’s new in ASP.NET Core 6
 
Tamir Dresher - Async Streams in C#
Tamir Dresher - Async Streams in C#Tamir Dresher - Async Streams in C#
Tamir Dresher - Async Streams in C#
 
Tamir Dresher Clarizen adventures with the wild GC during the holiday season
Tamir Dresher   Clarizen adventures with the wild GC during the holiday seasonTamir Dresher   Clarizen adventures with the wild GC during the holiday season
Tamir Dresher Clarizen adventures with the wild GC during the holiday season
 
Debugging tricks you wish you knew Tamir Dresher - Odessa 2019
Debugging tricks you wish you knew   Tamir Dresher - Odessa 2019Debugging tricks you wish you knew   Tamir Dresher - Odessa 2019
Debugging tricks you wish you knew Tamir Dresher - Odessa 2019
 
From zero to hero with the actor model - Tamir Dresher - Odessa 2019
From zero to hero with the actor model  - Tamir Dresher - Odessa 2019From zero to hero with the actor model  - Tamir Dresher - Odessa 2019
From zero to hero with the actor model - Tamir Dresher - Odessa 2019
 
Tamir Dresher - Demystifying the Core of .NET Core
Tamir Dresher  - Demystifying the Core of .NET CoreTamir Dresher  - Demystifying the Core of .NET Core
Tamir Dresher - Demystifying the Core of .NET Core
 
Breaking the monolith to microservice with Docker and Kubernetes (k8s)
Breaking the monolith to microservice with Docker and Kubernetes (k8s)Breaking the monolith to microservice with Docker and Kubernetes (k8s)
Breaking the monolith to microservice with Docker and Kubernetes (k8s)
 
.Net december 2017 updates - Tamir Dresher
.Net december 2017 updates - Tamir Dresher.Net december 2017 updates - Tamir Dresher
.Net december 2017 updates - Tamir Dresher
 
Testing time and concurrency Rx
Testing time and concurrency RxTesting time and concurrency Rx
Testing time and concurrency Rx
 
Building responsive application with Rx - confoo - tamir dresher
Building responsive application with Rx - confoo - tamir dresherBuilding responsive application with Rx - confoo - tamir dresher
Building responsive application with Rx - confoo - tamir dresher
 
.NET Debugging tricks you wish you knew tamir dresher
.NET Debugging tricks you wish you knew   tamir dresher.NET Debugging tricks you wish you knew   tamir dresher
.NET Debugging tricks you wish you knew tamir dresher
 
From Zero to the Actor Model (With Akka.Net) - CodeMash2017 - Tamir Dresher
From Zero to the Actor Model (With Akka.Net) - CodeMash2017 - Tamir DresherFrom Zero to the Actor Model (With Akka.Net) - CodeMash2017 - Tamir Dresher
From Zero to the Actor Model (With Akka.Net) - CodeMash2017 - Tamir Dresher
 
Building responsive applications with Rx - CodeMash2017 - Tamir Dresher
Building responsive applications with Rx  - CodeMash2017 - Tamir DresherBuilding responsive applications with Rx  - CodeMash2017 - Tamir Dresher
Building responsive applications with Rx - CodeMash2017 - Tamir Dresher
 
Debugging tricks you wish you knew - Tamir Dresher
Debugging tricks you wish you knew  - Tamir DresherDebugging tricks you wish you knew  - Tamir Dresher
Debugging tricks you wish you knew - Tamir Dresher
 
Rx 101 - Tamir Dresher - Copenhagen .NET User Group
Rx 101  - Tamir Dresher - Copenhagen .NET User GroupRx 101  - Tamir Dresher - Copenhagen .NET User Group
Rx 101 - Tamir Dresher - Copenhagen .NET User Group
 
Cloud patterns - NDC Oslo 2016 - Tamir Dresher
Cloud patterns - NDC Oslo 2016 - Tamir DresherCloud patterns - NDC Oslo 2016 - Tamir Dresher
Cloud patterns - NDC Oslo 2016 - Tamir Dresher
 
Reactiveness All The Way - SW Architecture 2015 Conference
Reactiveness All The Way - SW Architecture 2015 ConferenceReactiveness All The Way - SW Architecture 2015 Conference
Reactiveness All The Way - SW Architecture 2015 Conference
 
Rx 101 Codemotion Milan 2015 - Tamir Dresher
Rx 101   Codemotion Milan 2015 - Tamir DresherRx 101   Codemotion Milan 2015 - Tamir Dresher
Rx 101 Codemotion Milan 2015 - Tamir Dresher
 

Último

AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfonteinmasabamasaba
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfkalichargn70th171
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...Nitya salvi
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionOnePlan Solutions
 
Generic or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisionsGeneric or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisionsBert Jan Schrijver
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfVishalKumarJha10
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareJim McKeeth
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastPapp Krisztián
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesVictorSzoltysek
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech studentsHimanshiGarg82
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024Mind IT Systems
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park masabamasaba
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...masabamasaba
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 

Último (20)

AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
Generic or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisionsGeneric or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisions
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 

Anatomy of a data driven architecture - Tamir Dresher

  • 1. Anatomy of a Data Driven Architecture @tamir_dresher System Architect @ Payoneer
  • 2. 2 System Architect @ @tamir_dresher Tamir Dresher My Books: Software Engineering Lecturer Ruppin Academic Center tamirdr@payoneer.com https://www.israelclouds.com/iasaisrael
  • 3. 3 The need for DATA Your Data Data-Driven Decision Making Data-powered product • What markets are leading and where can I expand? • What’s slowing my process? • Is there a correlation between the time invested in a sale and the income from the tenant? • What products should I recommend to this user? • Is this action fraudulent ? • Should I suggest a discount to this user to raise the chance for a purchase?
  • 4. 4 ETL vs. ELT vs. Streaming Transform LoadExtract Transactional DB (OLTP) Analytical DB (OLAP) LoadExtract Transactional DB (OLTP) Analytical DB (OLAP) / Storage Transform Events Event Stream Stream Processor Real-time insights
  • 5. 5 Lambda (λ) & Kappa (ϰ) Architectures Speed Layer Batch Layer Serving Layer Real-time views Batched views Query Data Lambda Speed/Streaming Layer Serving Layer Speed views QueryData Kappa Processing
  • 7. 7Cross Cutting and Ops The Data Pipeline Architecture Data Sources Ingestion & Transformation Storage Query & Processing Consumption (Visualization, Alerting, Insights) Discovery/Catalog/ Metadata Management Security Quality&Testing Observability
  • 8. 8 Data Sources Ingestion & Transformation Storage Query & Processing Consumption Databases Files Events APIs • CDC - change data capture • Object Storage • Structured/ Unstructured • Customer tracking • WebHooks • Domain Events • External Services • Enrichment
  • 9. 9 Data Sources Ingestion & Transformation Storage Query & Processing Consumption Data Integration Platforms • Connectors to sources and destinations • E.g. fivtran, stitchdata, rivery Data Sources
  • 10. 10 Data Sources Ingestion & Transformation Storage Query & Processing Consumption Data Integration Platforms • Connectors to sources and destinations • E.g. fivtran, stitch
  • 11. 11 Data Sources Ingestion & Transformation Storage Query & Processing Consumption Data Integration Platforms • Connectors to sources and destinations • E.g. fivtran, stitchdata, rivery Data Sources Customized batching/micro-batching • Spark jobs, Data libraries (Pandas, boto), Hive • Workflows – Airflow, Dagster, Luigi
  • 12. 12 Data Sources Ingestion & Transformation Storage Query & Processing Consumption Data Integration Platforms • Connectors to sources and destinations • E.g. fivtran, stitchdata, rivery Data Sources Customized batching/micro-batching • Spark jobs, Data libraries (panda, boto), Hive • Workflows – Airflow, Dagster, Luigi https://towardsdatascience.com/building-a-production-level-etl-pipeline-platform-using-apache-airflow-a4cf34203fbd
  • 13. 13 Data Sources Ingestion & Transformation Storage Query & Processing Consumption Data Integration Platforms • Connectors to sources and destinations • E.g. fivtran, stitchdata, rivery Data Sources Customized batching/micro-batching • Spark jobs, Data libraries (panda, boto), Hive • Workflows – Airflow, Dagster, Luigi Event streaming and processing • Messaging - Kafka, Pulsar, Kinesis, Event Hub • Processing – Spark, Flink, Samza, Kafka Streams, Azure Stream Analytics
  • 14. 14 Data Sources Ingestion & Transformation Storage Query & Processing Consumption Data Integration Platforms • Connectors to sources and destinations • E.g. fivtran, stitchdata, rivery Data Sources Customized batching/micro-batching • Spark jobs, Data libraries (panda, boto), Hive • Workflows – Airflow, Dagster, Luigi Event streaming and processing • Messaging - Kafka, Pulsar, Kinesis, Event Hub • Processing – Spark, Flink, Samza, Kafka Streams, Azure Stream Analytics -- Continuously aggregating a stream into a table with a ksqlDB push query. CREATE STREAM locationUpdatesStream ...; CREATE TABLE locationsPerUser AS SELECT username, COUNT(*) FROM locationUpdatesStream GROUP BY username EMIT CHANGES; // Continuously aggregating to table KStream<String, String> locationUpdatesStream = ...; KTable<String, Long> locationsPerUser = locationUpdatesStream .groupBy((k, v) -> v.username) .count(); https://www.confluent.io/blog/kafka-streams-tables-part-1-event-streaming/
  • 15. 15 Data Sources Ingestion & Transformation Storage Query & Processing Consumption Data Warehouse • Structured format • Designed to quickly generate insights based on SQL like queries • Modern cloud base offerings – Redshift, BigQuery, Snowflake, Azure Synapse Data Lake • Structured and non-structured data – CSV, Parquet, Images, Audio • Raw and historical data • Designed to be used by data scientists and create models by various languages
  • 16. 16 Data Sources Ingestion & Transformation Storage Query & Processing Consumption Predictive • Generate model • Data Science and ML libraries – Pandas, Numpy, R, scikit etc • The model is periodically refreshed Storage Retrospective (Historical) • Deriving Intelligence based on statistics • Built-in engine (Data Warehouse) OR • Query Engines – Presto, Impala Real Time Analytics • Run analytical queries over big volumes of data with interactive latencies. • Apache Pinot, Clickhouse, Druid Data Science Platforms • Helps managing the workflows, productization and operations - SageMaker, Iguazio, DataBricks etc.
  • 17. 17 Data Sources Ingestion & Transformation Storage Query & Processing Consumption Custom Apps • Execute the model – how is the model reachable? • Translate user/system actions to queries • Visualize – custom (Plotly Dash, Streamlit etc) or embedded (PowerBI, Looker etc) External Apps • Augmented Analytics - External services to generate insights and explain them (e.g. Anomaly Detection with Anodot/ CrunchMetrics/outlier.ai) • Customizable Reports and Dashboard (e.g. Looker, Tableu, PowerBI, Sisense)
  • 18. 18 Summary Data at Rest Data in Motion Event/Dat a Stream Workflow Engine Stream Processor Real-time insights Analytical Storage /Lake Query Engine Model Engine Model Accessible API Application Data Sources Ingestion&Transformation Storage Query&Processing Consume
  • 19. 19 Data at Rest Data in Motion Event/Data Stream Workflow Engine Stream Processor Real-time insights Analytical Storage /Lake Query Engine Model Engine Model Accessible API Application Data Sources Ingestion&Transformation Storage Query&Processing Consume Thank You! @tamir_dresher

Notas do Editor

  1. Data comes from many sources and feed our application We need it for the OLTP obviously but being the new gold we can use the data feed to and historical data to gain more insights BI ML/AI data-driven decision making (analytic systems) and drive data-powered products
  2. Traditionally , orgs moved the data from the OLTP to the OLAP (DWH) with ETL Modern architectures now relies on ELT for batched load And to get the best real time response streaming is the way to go
  3. Traditionally , orgs moved the data from the OLTP to the OLAP (DWH) with ETL Modern architectures now relies on ELT for batched load And to get the best real time response streaming is the way to go
  4. https://www.mckinsey.com/business-functions/mckinsey-digital/our-insights/how-to-build-a-data-architecture-to-drive-innovation-today-and-tomorrow#
  5. https://leventov.medium.com/comparison-of-the-open-source-olap-systems-for-big-data-clickhouse-druid-and-pinot-8e042a5ed1c7