Meetup - Brasil - Data In Motion - 2023 September 19

Timothy Spann
Timothy SpannDeveloper Advocate em StreamNative
Data in Motion:
Overview e Novidades do
NiFi, Kafka e Flink
Tim Spann - Principal Developer Advocate
Data In Motion
Meetup - Brasil - Data In Motion - 2023 September 19
3
© 2023 Cloudera, Inc. All rights reserved.
TODAY’S LEAD
Who am I?
@PaasDev
DZone Zone Leader and Big Data MVB
Princeton and NYC Future of Data Meetups
ex-Pivotal Field Engineer ex-StreamNative ex-PwC
https://github.com/tspannhw https://twitter.com/PaaSDev
https://www.datainmotion.dev/
https://medium.com/@tspann
Principal Data-in-Motion Developer Advocate
4
© 2023 Cloudera, Inc. All rights reserved.
Data in Motion: Overview e Novidades do NiFi, Kafka e Flink
Apresentador: Tim Spann - Principal DIM Specialist and Developer Advocate
Intro to NiFi
Intro to Kafka
Intro to Flink
Together as FLaNK
Demos
Q&A
© 2023 Cloudera, Inc. All rights reserved. 5
REAL-TIME REQUIRES A PLATFORM
SQL
Stream
Builder
© 2023 Cloudera, Inc. All rights reserved. 6
REST API ARCHITECTURE - Using FLaNK to pull the data out of anything in near-real time
INGEST PREPARE PUBLISH
DATA SOURCES
Internal Users
(After Sales)
External
Systems
ENTERPRISE
LAKEHOUSE
CAPABILITY VIEW
INGESTION
MESSAGE HUB
STORAGE
BATCH
MANAGEMENT
STREAM
CONSUMPTION
Closed Loop
Systems
SQL Stream Builder
Machine Learning
Data Visualization
Workload Manager
watsonx.data
Cloudera DataFlow - Apache NiFi
© 2019 Cloudera, Inc. All rights reserved. 8
CLOUDERA DATAFLOW - POWERED BY APACHE NiFi
Ingest and manage data from edge-to-cloud using a no-code interface
● #1 data ingestion/movement engine
● Strong community
● Product maturity over 11 years
● Deploy on-premises or in the cloud
● Over 400+ pre-built processors
● Built-in data provenance
● Guaranteed delivery
● Throttling and Back pressure
© 2023 Cloudera, Inc. All rights reserved. 9
PROVENANCE
10
© 2023 Cloudera, Inc. All rights reserved.
RECORD-ORIENTED DATA WITH NIFI
• Record Readers - Avro, CSV, Grok, IPFIX, JSAN1, JSON, Parquet,
Scripted, Syslog5424, Syslog, WindowsEvent, XML
• Record Writers - Avro, CSV, FreeFromText, Json, Parquet,
Scripted, XML
• Record Reader and Writer support referencing a schema registry
for retrieving schemas when necessary.
• Enable processors that accept any data format without having to
worry about the parsing and serialization logic.
• Allows us to keep FlowFiles larger, each consisting of multiple
records, which results in far better performance.
11
© 2023 Cloudera, Inc. All rights reserved.
RUNNING SQL ON FLOWFILES
• Evaluates one or more SQL queries against the contents of a
FlowFile.
• This can be used, for example, for field-specific filtering,
transformation, and row-level filtering.
• Columns can be renamed, simple calculations and aggregations
performed.
• The SQL statement must be valid ANSI SQL and is powered by
Apache Calcite.
12
© 2023 Cloudera, Inc. All rights reserved.
READYFLOW
GALLERY
• Cloudera provided flow
definitions
• Cover most common data flow
use cases
• Optimized to work with CDP
sources/destinations
• Can be deployed and adjusted
as needed
Cloudera Streams Messaging
Manager - Apache Kafka
14
© 2023 Cloudera, Inc. All rights reserved.
STREAMS MESSAGING WITH KAFKA
• Highly reliable distributed messaging system.
• Decouple applications, enables many-to-many
patterns.
• Publish-Subscribe semantics.
• Horizontal scalability.
• Efficient implementation to operate at speed with
big data volumes.
• Organized by topic to support several use cases.
Cloudera SQL Stream Builder - Flink
SQL
16
© 2023 Cloudera, Inc. All rights reserved.
DELIVERING STREAMING ANALYTICS
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. (second)
SQL
Parsing and
Blending Data
Streaming
Analytics
Both offline and
streaming data
Data Analysts Can
Write Queries
Across the Lines of Business
Capture Events
that Matter
Low-latency analytics use
cases
Events
Processing
17
© 2022 Cloudera, Inc. All rights reserved.
SQL STREAM BUILDER (SSB)
SQL STREAM BUILDER allows
developers, analysts, and data
scientists to write streaming
applications with industry
standard SQL.
No Java or Scala code
development required.
Simplifies access to data in Kafka
& Flink. Connectors to batch data in
HDFS, Kudu, Hive, S3, JDBC, CDC
and more
Enrich streaming data with batch
data in a single tool
Democratize access to real-time data with just SQL
18
© 2023 Cloudera, Inc. All rights reserved.
SSB MATERIALIZED VIEWS
Key Takeaway; MV’s allow data scientist, analyst and developers consume data from the firehose
Demo
20
© 2023 Cloudera, Inc. All rights reserved.
Data in Motion: Overview e Novidades do NiFi, Kafka e Flink
Apresentador: Tim Spann - Principal DIM Specialist and Developer Advocate
21
© 2023 Cloudera, Inc. All rights reserved.
FREE LEARNING ENVIRONMENT
23
© 2023 Cloudera, Inc. All rights reserved.
Cloudera Streams
Processing -
Community Edition
• Kafka, KConnect, SMM, SR,
Flink, and SSB in Docker
• Runs in Docker
• Try new features quickly
• Develop applications locally
● Docker compose file of CSP to run from command line w/o any
dependencies, including Flink, SQL Stream Builder, Kafka, Kafka
Connect, Streams Messaging Manager and Schema Registry
○ $> docker compose up
● Licensed under the Cloudera Community License
● Unsupported
● Community Group Hub for CSP
● Find it on docs.cloudera.com under Applications
Open Source Edition
• Apache NiFi in Docker
• Runs in Docker
• Try new features
quickly
• Develop applications
locally
● Docker NiFi
○ docker run --name nifi -p 8443:8443 -d -e
SINGLE_USER_CREDENTIALS_USERNAME=admin -e
SINGLE_USER_CREDENTIALS_PASSWORD=ctsBtRBKHRAx69EqUgh
vvgEvjnaLjFEB apache/nifi:latest
● Licensed under the ASF License
● Unsupported
https://hub.docker.com/r/apache/nifi
RESOURCES, WRAP-UP, Q&A
© 2023 Cloudera, Inc. All rights reserved. 26
Future of Data - NYC / Princeton + Virtual
@PaasDev
https://www.meetup.com/futureofdata-princeton/
https://www.meetup.com/futureofdata-newyork/
From Big Data to AI to Streaming to LLM to Cloud to
Analytics to NLP to Fast Data to Machine Learning to
Microservices to ...
https://medium.com/cloudera-inc/streaming-llm-with-apache-nifi-huggin
gface-ad2f0d367468
28
© 2023 Cloudera, Inc. All rights reserved.
Streaming Resources
• https://dzone.com/articles/real-time-stream-processing-with-hazelcast-an
d-streamnative
• https://flipstackweekly.com/
• https://www.datainmotion.dev/
• https://www.flankstack.dev/
• https://github.com/tspannhw
• https://medium.com/@tspann
• https://medium.com/@tspann/predictions-for-streaming-in-2023-ad4d739
5d714
• https://www.apachecon.com/acna2022/slides/04_Spann_Tim_Citizen_Str
eaming_Engineer.pdf
© 2023 Cloudera, Inc. All rights reserved. 29
FLaNK Stack Weekly
This week in Apache NiFi, Apache Flink, Apache
Kafka, Apache Spark, Apache Iceberg, Python,
Java and Open Source friends.
https://bit.ly/32dAJft
Generative AI
https://github.com/tspannhw/FLaNK-HuggingFace-DistilBert-SentimentAnalysis
https://github.com/tspannhw/FLaNK-LLM
watsonx.ai
LLM USE CASE
Vector DB
AI Model
Unstructured file types
Data in Motion
on Cloudera Data
Platform (CDP)
Capture, process &
distribute any data,
anywhere
Other enterprise data Open Data Lakehouse
Materialized Views
Structured Sources
Applications/API’s
Streams
Meetup - Brasil - Data In Motion - 2023 September 19
33
© 2023 Cloudera, Inc. All rights reserved.
TH N Y U
1 de 33

Recomendados

OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf por
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdfOSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdfTimothy Spann
23 visualizações43 slides
GSJUG: Mastering Data Streaming Pipelines 09May2023 por
GSJUG: Mastering Data Streaming Pipelines 09May2023GSJUG: Mastering Data Streaming Pipelines 09May2023
GSJUG: Mastering Data Streaming Pipelines 09May2023Timothy Spann
255 visualizações80 slides
Building Real-Time Travel Alerts por
Building Real-Time Travel AlertsBuilding Real-Time Travel Alerts
Building Real-Time Travel AlertsTimothy Spann
111 visualizações48 slides
JConWorld_ Continuous SQL with Kafka and Flink por
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkTimothy Spann
103 visualizações36 slides
Building Real-time Pipelines with FLaNK_ A Case Study with Transit Data por
Building Real-time Pipelines with FLaNK_ A Case Study with Transit DataBuilding Real-time Pipelines with FLaNK_ A Case Study with Transit Data
Building Real-time Pipelines with FLaNK_ A Case Study with Transit DataTimothy Spann
193 visualizações45 slides
ITPC Building Modern Data Streaming Apps por
ITPC Building Modern Data Streaming AppsITPC Building Modern Data Streaming Apps
ITPC Building Modern Data Streaming AppsTimothy Spann
797 visualizações64 slides

Mais conteúdo relacionado

Similar a Meetup - Brasil - Data In Motion - 2023 September 19

The Never Landing Stream with HTAP and Streaming por
The Never Landing Stream with HTAP and StreamingThe Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and StreamingTimothy Spann
254 visualizações39 slides
Using Apache NiFi with Apache Pulsar for Fast Data On-Ramp por
Using Apache NiFi with Apache Pulsar for Fast Data On-RampUsing Apache NiFi with Apache Pulsar for Fast Data On-Ramp
Using Apache NiFi with Apache Pulsar for Fast Data On-RampTimothy Spann
163 visualizações27 slides
Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023 por
Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023
Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023ssuser73434e
54 visualizações79 slides
Best Practices For Workflow por
Best Practices For WorkflowBest Practices For Workflow
Best Practices For WorkflowTimothy Spann
89 visualizações86 slides
Kafka for DBAs por
Kafka for DBAsKafka for DBAs
Kafka for DBAsGwen (Chen) Shapira
12.8K visualizações41 slides
Introduction to Apache NiFi 1.10 por
Introduction to Apache NiFi 1.10Introduction to Apache NiFi 1.10
Introduction to Apache NiFi 1.10Timothy Spann
2K visualizações24 slides

Similar a Meetup - Brasil - Data In Motion - 2023 September 19(20)

The Never Landing Stream with HTAP and Streaming por Timothy Spann
The Never Landing Stream with HTAP and StreamingThe Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and Streaming
Timothy Spann254 visualizações
Using Apache NiFi with Apache Pulsar for Fast Data On-Ramp por Timothy Spann
Using Apache NiFi with Apache Pulsar for Fast Data On-RampUsing Apache NiFi with Apache Pulsar for Fast Data On-Ramp
Using Apache NiFi with Apache Pulsar for Fast Data On-Ramp
Timothy Spann163 visualizações
Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023 por ssuser73434e
Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023
Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023
ssuser73434e54 visualizações
Best Practices For Workflow por Timothy Spann
Best Practices For WorkflowBest Practices For Workflow
Best Practices For Workflow
Timothy Spann89 visualizações
Kafka for DBAs por Gwen (Chen) Shapira
Kafka for DBAsKafka for DBAs
Kafka for DBAs
Gwen (Chen) Shapira12.8K visualizações
Introduction to Apache NiFi 1.10 por Timothy Spann
Introduction to Apache NiFi 1.10Introduction to Apache NiFi 1.10
Introduction to Apache NiFi 1.10
Timothy Spann2K visualizações
Meet the Committers Webinar_ Lab Preparation por Timothy Spann
Meet the Committers Webinar_ Lab PreparationMeet the Committers Webinar_ Lab Preparation
Meet the Committers Webinar_ Lab Preparation
Timothy Spann32 visualizações
CoC23_Utilizing Real-Time Transit Data for Travel Optimization por Timothy Spann
CoC23_Utilizing Real-Time Transit Data for Travel OptimizationCoC23_Utilizing Real-Time Transit Data for Travel Optimization
CoC23_Utilizing Real-Time Transit Data for Travel Optimization
Timothy Spann30 visualizações
Reinventing Kafka in the Data Streaming Era - Jun Rao por confluent
Reinventing Kafka in the Data Streaming Era - Jun RaoReinventing Kafka in the Data Streaming Era - Jun Rao
Reinventing Kafka in the Data Streaming Era - Jun Rao
confluent139 visualizações
Part 2: A Visual Dive into Machine Learning and Deep Learning 
 por Cloudera, Inc.
Part 2: A Visual Dive into Machine Learning and Deep Learning 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 

Cloudera, Inc.1.5K visualizações
spring-cloud.pptx por ssuser7959eb
spring-cloud.pptxspring-cloud.pptx
spring-cloud.pptx
ssuser7959eb11 visualizações
PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends por Timothy Spann
PortoTechHub  - Hail Hydrate! From Stream to Lake with Apache Pulsar and FriendsPortoTechHub  - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
Timothy Spann986 visualizações
Best Practices for Building Hybrid-Cloud Architectures | Hans Jespersen por confluent
Best Practices for Building Hybrid-Cloud Architectures | Hans JespersenBest Practices for Building Hybrid-Cloud Architectures | Hans Jespersen
Best Practices for Building Hybrid-Cloud Architectures | Hans Jespersen
confluent403 visualizações
Live Demo Jam Expands: The Leading-Edge Streaming Data Platform with NiFi, Ka... por Timothy Spann
Live Demo Jam Expands: The Leading-Edge Streaming Data Platform with NiFi, Ka...Live Demo Jam Expands: The Leading-Edge Streaming Data Platform with NiFi, Ka...
Live Demo Jam Expands: The Leading-Edge Streaming Data Platform with NiFi, Ka...
Timothy Spann518 visualizações
Spark One Platform Webinar por Cloudera, Inc.
Spark One Platform WebinarSpark One Platform Webinar
Spark One Platform Webinar
Cloudera, Inc.2.5K visualizações
Music city data Hail Hydrate! from stream to lake por Timothy Spann
Music city data Hail Hydrate! from stream to lakeMusic city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lake
Timothy Spann708 visualizações
Webinar | Better Together: Apache Cassandra and Apache Kafka por DataStax
Webinar  |  Better Together: Apache Cassandra and Apache KafkaWebinar  |  Better Together: Apache Cassandra and Apache Kafka
Webinar | Better Together: Apache Cassandra and Apache Kafka
DataStax880 visualizações
End to End Streaming Architectures por Cloudera, Inc.
End to End Streaming ArchitecturesEnd to End Streaming Architectures
End to End Streaming Architectures
Cloudera, Inc.3.5K visualizações
Kubernetes connectivity to Cloud Native Kafka | Evan Shortiss and Hugo Guerre... por HostedbyConfluent
Kubernetes connectivity to Cloud Native Kafka | Evan Shortiss and Hugo Guerre...Kubernetes connectivity to Cloud Native Kafka | Evan Shortiss and Hugo Guerre...
Kubernetes connectivity to Cloud Native Kafka | Evan Shortiss and Hugo Guerre...
HostedbyConfluent324 visualizações
Confluent On Azure: Why you should add Confluent to your Azure toolkit | Alic... por HostedbyConfluent
Confluent On Azure: Why you should add Confluent to your Azure toolkit | Alic...Confluent On Azure: Why you should add Confluent to your Azure toolkit | Alic...
Confluent On Azure: Why you should add Confluent to your Azure toolkit | Alic...
HostedbyConfluent388 visualizações

Mais de Timothy Spann

[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines por
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
[EN]DSS23_tspann_Integrating LLM with Streaming Data PipelinesTimothy Spann
96 visualizações25 slides
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo por
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines DemoEvolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines DemoTimothy Spann
158 visualizações8 slides
CoC23_ Looking at the New Features of Apache NiFi por
CoC23_ Looking at the New Features of Apache NiFiCoC23_ Looking at the New Features of Apache NiFi
CoC23_ Looking at the New Features of Apache NiFiTimothy Spann
36 visualizações24 slides
CoC23_ Let’s Monitor The Conditions at the Conference por
CoC23_ Let’s Monitor The Conditions at the ConferenceCoC23_ Let’s Monitor The Conditions at the Conference
CoC23_ Let’s Monitor The Conditions at the ConferenceTimothy Spann
17 visualizações17 slides
Implement a Universal Data Distribution Architecture to Manage All Streaming ... por
Implement a Universal Data Distribution Architecture to Manage All Streaming ...Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Implement a Universal Data Distribution Architecture to Manage All Streaming ...Timothy Spann
28 visualizações56 slides
big data fest building modern data streaming apps por
big data fest building modern data streaming appsbig data fest building modern data streaming apps
big data fest building modern data streaming appsTimothy Spann
317 visualizações55 slides

Mais de Timothy Spann(17)

[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines por Timothy Spann
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
Timothy Spann96 visualizações
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo por Timothy Spann
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines DemoEvolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Timothy Spann158 visualizações
CoC23_ Looking at the New Features of Apache NiFi por Timothy Spann
CoC23_ Looking at the New Features of Apache NiFiCoC23_ Looking at the New Features of Apache NiFi
CoC23_ Looking at the New Features of Apache NiFi
Timothy Spann36 visualizações
CoC23_ Let’s Monitor The Conditions at the Conference por Timothy Spann
CoC23_ Let’s Monitor The Conditions at the ConferenceCoC23_ Let’s Monitor The Conditions at the Conference
CoC23_ Let’s Monitor The Conditions at the Conference
Timothy Spann17 visualizações
Implement a Universal Data Distribution Architecture to Manage All Streaming ... por Timothy Spann
Implement a Universal Data Distribution Architecture to Manage All Streaming ...Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Timothy Spann28 visualizações
big data fest building modern data streaming apps por Timothy Spann
big data fest building modern data streaming appsbig data fest building modern data streaming apps
big data fest building modern data streaming apps
Timothy Spann317 visualizações
BestInFlowCompetitionTutorials03May2023 por Timothy Spann
BestInFlowCompetitionTutorials03May2023BestInFlowCompetitionTutorials03May2023
BestInFlowCompetitionTutorials03May2023
Timothy Spann11 visualizações
CloudToolGuidance03May2023 por Timothy Spann
CloudToolGuidance03May2023CloudToolGuidance03May2023
CloudToolGuidance03May2023
Timothy Spann10 visualizações
Cloudera Sandbox Event Guidelines For Workflow por Timothy Spann
Cloudera Sandbox Event Guidelines For WorkflowCloudera Sandbox Event Guidelines For Workflow
Cloudera Sandbox Event Guidelines For Workflow
Timothy Spann32 visualizações
DevNexus: Apache Pulsar Development 101 with Java por Timothy Spann
DevNexus:  Apache Pulsar Development 101 with JavaDevNexus:  Apache Pulsar Development 101 with Java
DevNexus: Apache Pulsar Development 101 with Java
Timothy Spann261 visualizações
Conf42 Python_ ML Enhanced Event Streaming Apps with Python Microservices por Timothy Spann
Conf42 Python_ ML Enhanced Event Streaming Apps with Python MicroservicesConf42 Python_ ML Enhanced Event Streaming Apps with Python Microservices
Conf42 Python_ ML Enhanced Event Streaming Apps with Python Microservices
Timothy Spann443 visualizações
PythonWebConference_ Cloud Native Apache Pulsar Development 202 with Python por Timothy Spann
PythonWebConference_ Cloud Native Apache Pulsar Development 202 with PythonPythonWebConference_ Cloud Native Apache Pulsar Development 202 with Python
PythonWebConference_ Cloud Native Apache Pulsar Development 202 with Python
Timothy Spann430 visualizações
PhillyJug Getting Started With Real-time Cloud Native Streaming With Java por Timothy Spann
PhillyJug  Getting Started With Real-time Cloud Native Streaming With JavaPhillyJug  Getting Started With Real-time Cloud Native Streaming With Java
PhillyJug Getting Started With Real-time Cloud Native Streaming With Java
Timothy Spann625 visualizações
Why Spring Belongs In Your Data Stream (From Edge to Multi-Cloud) por Timothy Spann
Why Spring Belongs In Your Data Stream (From Edge to Multi-Cloud)Why Spring Belongs In Your Data Stream (From Edge to Multi-Cloud)
Why Spring Belongs In Your Data Stream (From Edge to Multi-Cloud)
Timothy Spann17 visualizações
Living the Stream Dream with Pulsar and Spring Boot por Timothy Spann
Living the Stream Dream with Pulsar and Spring BootLiving the Stream Dream with Pulsar and Spring Boot
Living the Stream Dream with Pulsar and Spring Boot
Timothy Spann92 visualizações
Let's keep it simple and streaming por Timothy Spann
Let's keep it simple and streamingLet's keep it simple and streaming
Let's keep it simple and streaming
Timothy Spann19 visualizações
Sink Your Teeth into Streaming at Any Scale por Timothy Spann
Sink Your Teeth into Streaming at Any ScaleSink Your Teeth into Streaming at Any Scale
Sink Your Teeth into Streaming at Any Scale
Timothy Spann12 visualizações

Último

ColonyOS por
ColonyOSColonyOS
ColonyOSJohanKristiansson6
9 visualizações17 slides
Introduction to Microsoft Fabric.pdf por
Introduction to Microsoft Fabric.pdfIntroduction to Microsoft Fabric.pdf
Introduction to Microsoft Fabric.pdfishaniuudeshika
29 visualizações16 slides
Vikas 500 BIG DATA TECHNOLOGIES LAB.pdf por
Vikas 500 BIG DATA TECHNOLOGIES LAB.pdfVikas 500 BIG DATA TECHNOLOGIES LAB.pdf
Vikas 500 BIG DATA TECHNOLOGIES LAB.pdfvikas12611618
8 visualizações30 slides
Cross-network in Google Analytics 4.pdf por
Cross-network in Google Analytics 4.pdfCross-network in Google Analytics 4.pdf
Cross-network in Google Analytics 4.pdfGA4 Tutorials
6 visualizações7 slides
[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx por
[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx
[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptxDataScienceConferenc1
5 visualizações12 slides
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation por
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented GenerationDataScienceConferenc1
7 visualizações29 slides

Último(20)

Introduction to Microsoft Fabric.pdf por ishaniuudeshika
Introduction to Microsoft Fabric.pdfIntroduction to Microsoft Fabric.pdf
Introduction to Microsoft Fabric.pdf
ishaniuudeshika29 visualizações
Vikas 500 BIG DATA TECHNOLOGIES LAB.pdf por vikas12611618
Vikas 500 BIG DATA TECHNOLOGIES LAB.pdfVikas 500 BIG DATA TECHNOLOGIES LAB.pdf
Vikas 500 BIG DATA TECHNOLOGIES LAB.pdf
vikas126116188 visualizações
Cross-network in Google Analytics 4.pdf por GA4 Tutorials
Cross-network in Google Analytics 4.pdfCross-network in Google Analytics 4.pdf
Cross-network in Google Analytics 4.pdf
GA4 Tutorials6 visualizações
[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx por DataScienceConferenc1
[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx
[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx
DataScienceConferenc15 visualizações
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation por DataScienceConferenc1
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
DataScienceConferenc17 visualizações
RuleBookForTheFairDataEconomy.pptx por noraelstela1
RuleBookForTheFairDataEconomy.pptxRuleBookForTheFairDataEconomy.pptx
RuleBookForTheFairDataEconomy.pptx
noraelstela167 visualizações
Survey on Factuality in LLM's.pptx por NeethaSherra1
Survey on Factuality in LLM's.pptxSurvey on Factuality in LLM's.pptx
Survey on Factuality in LLM's.pptx
NeethaSherra15 visualizações
Data structure and algorithm. por Abdul salam
Data structure and algorithm. Data structure and algorithm.
Data structure and algorithm.
Abdul salam 19 visualizações
Short Story Assignment by Kelly Nguyen por kellynguyen01
Short Story Assignment by Kelly NguyenShort Story Assignment by Kelly Nguyen
Short Story Assignment by Kelly Nguyen
kellynguyen0119 visualizações
PROGRAMME.pdf por HiNedHaJar
PROGRAMME.pdfPROGRAMME.pdf
PROGRAMME.pdf
HiNedHaJar18 visualizações
3196 The Case of The East River por ErickANDRADE90
3196 The Case of The East River3196 The Case of The East River
3196 The Case of The East River
ErickANDRADE9011 visualizações
Advanced_Recommendation_Systems_Presentation.pptx por neeharikasingh29
Advanced_Recommendation_Systems_Presentation.pptxAdvanced_Recommendation_Systems_Presentation.pptx
Advanced_Recommendation_Systems_Presentation.pptx
neeharikasingh295 visualizações
UNEP FI CRS Climate Risk Results.pptx por pekka28
UNEP FI CRS Climate Risk Results.pptxUNEP FI CRS Climate Risk Results.pptx
UNEP FI CRS Climate Risk Results.pptx
pekka2811 visualizações
Chapter 3b- Process Communication (1) (1)(1) (1).pptx por ayeshabaig2004
Chapter 3b- Process Communication (1) (1)(1) (1).pptxChapter 3b- Process Communication (1) (1)(1) (1).pptx
Chapter 3b- Process Communication (1) (1)(1) (1).pptx
ayeshabaig20045 visualizações
CRIJ4385_Death Penalty_F23.pptx por yvettemm100
CRIJ4385_Death Penalty_F23.pptxCRIJ4385_Death Penalty_F23.pptx
CRIJ4385_Death Penalty_F23.pptx
yvettemm1006 visualizações
RIO GRANDE SUPPLY COMPANY INC, JAYSON.docx por JaysonGarabilesEspej
RIO GRANDE SUPPLY COMPANY INC, JAYSON.docxRIO GRANDE SUPPLY COMPANY INC, JAYSON.docx
RIO GRANDE SUPPLY COMPANY INC, JAYSON.docx
JaysonGarabilesEspej6 visualizações
Understanding Hallucinations in LLMs - 2023 09 29.pptx por Greg Makowski
Understanding Hallucinations in LLMs - 2023 09 29.pptxUnderstanding Hallucinations in LLMs - 2023 09 29.pptx
Understanding Hallucinations in LLMs - 2023 09 29.pptx
Greg Makowski17 visualizações
Supercharging your Data with Azure AI Search and Azure OpenAI por Peter Gallagher
Supercharging your Data with Azure AI Search and Azure OpenAISupercharging your Data with Azure AI Search and Azure OpenAI
Supercharging your Data with Azure AI Search and Azure OpenAI
Peter Gallagher37 visualizações

Meetup - Brasil - Data In Motion - 2023 September 19

  • 1. Data in Motion: Overview e Novidades do NiFi, Kafka e Flink Tim Spann - Principal Developer Advocate Data In Motion
  • 3. 3 © 2023 Cloudera, Inc. All rights reserved. TODAY’S LEAD Who am I? @PaasDev DZone Zone Leader and Big Data MVB Princeton and NYC Future of Data Meetups ex-Pivotal Field Engineer ex-StreamNative ex-PwC https://github.com/tspannhw https://twitter.com/PaaSDev https://www.datainmotion.dev/ https://medium.com/@tspann Principal Data-in-Motion Developer Advocate
  • 4. 4 © 2023 Cloudera, Inc. All rights reserved. Data in Motion: Overview e Novidades do NiFi, Kafka e Flink Apresentador: Tim Spann - Principal DIM Specialist and Developer Advocate Intro to NiFi Intro to Kafka Intro to Flink Together as FLaNK Demos Q&A
  • 5. © 2023 Cloudera, Inc. All rights reserved. 5 REAL-TIME REQUIRES A PLATFORM SQL Stream Builder
  • 6. © 2023 Cloudera, Inc. All rights reserved. 6 REST API ARCHITECTURE - Using FLaNK to pull the data out of anything in near-real time INGEST PREPARE PUBLISH DATA SOURCES Internal Users (After Sales) External Systems ENTERPRISE LAKEHOUSE CAPABILITY VIEW INGESTION MESSAGE HUB STORAGE BATCH MANAGEMENT STREAM CONSUMPTION Closed Loop Systems SQL Stream Builder Machine Learning Data Visualization Workload Manager watsonx.data
  • 7. Cloudera DataFlow - Apache NiFi
  • 8. © 2019 Cloudera, Inc. All rights reserved. 8 CLOUDERA DATAFLOW - POWERED BY APACHE NiFi Ingest and manage data from edge-to-cloud using a no-code interface ● #1 data ingestion/movement engine ● Strong community ● Product maturity over 11 years ● Deploy on-premises or in the cloud ● Over 400+ pre-built processors ● Built-in data provenance ● Guaranteed delivery ● Throttling and Back pressure
  • 9. © 2023 Cloudera, Inc. All rights reserved. 9 PROVENANCE
  • 10. 10 © 2023 Cloudera, Inc. All rights reserved. RECORD-ORIENTED DATA WITH NIFI • Record Readers - Avro, CSV, Grok, IPFIX, JSAN1, JSON, Parquet, Scripted, Syslog5424, Syslog, WindowsEvent, XML • Record Writers - Avro, CSV, FreeFromText, Json, Parquet, Scripted, XML • Record Reader and Writer support referencing a schema registry for retrieving schemas when necessary. • Enable processors that accept any data format without having to worry about the parsing and serialization logic. • Allows us to keep FlowFiles larger, each consisting of multiple records, which results in far better performance.
  • 11. 11 © 2023 Cloudera, Inc. All rights reserved. RUNNING SQL ON FLOWFILES • Evaluates one or more SQL queries against the contents of a FlowFile. • This can be used, for example, for field-specific filtering, transformation, and row-level filtering. • Columns can be renamed, simple calculations and aggregations performed. • The SQL statement must be valid ANSI SQL and is powered by Apache Calcite.
  • 12. 12 © 2023 Cloudera, Inc. All rights reserved. READYFLOW GALLERY • Cloudera provided flow definitions • Cover most common data flow use cases • Optimized to work with CDP sources/destinations • Can be deployed and adjusted as needed
  • 14. 14 © 2023 Cloudera, Inc. All rights reserved. STREAMS MESSAGING WITH KAFKA • Highly reliable distributed messaging system. • Decouple applications, enables many-to-many patterns. • Publish-Subscribe semantics. • Horizontal scalability. • Efficient implementation to operate at speed with big data volumes. • Organized by topic to support several use cases.
  • 15. Cloudera SQL Stream Builder - Flink SQL
  • 16. 16 © 2023 Cloudera, Inc. All rights reserved. DELIVERING STREAMING ANALYTICS 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. (second) SQL Parsing and Blending Data Streaming Analytics Both offline and streaming data Data Analysts Can Write Queries Across the Lines of Business Capture Events that Matter Low-latency analytics use cases Events Processing
  • 17. 17 © 2022 Cloudera, Inc. All rights reserved. SQL STREAM BUILDER (SSB) SQL STREAM BUILDER allows developers, analysts, and data scientists to write streaming applications with industry standard SQL. No Java or Scala code development required. Simplifies access to data in Kafka & Flink. Connectors to batch data in HDFS, Kudu, Hive, S3, JDBC, CDC and more Enrich streaming data with batch data in a single tool Democratize access to real-time data with just SQL
  • 18. 18 © 2023 Cloudera, Inc. All rights reserved. SSB MATERIALIZED VIEWS Key Takeaway; MV’s allow data scientist, analyst and developers consume data from the firehose
  • 19. Demo
  • 20. 20 © 2023 Cloudera, Inc. All rights reserved. Data in Motion: Overview e Novidades do NiFi, Kafka e Flink Apresentador: Tim Spann - Principal DIM Specialist and Developer Advocate
  • 21. 21 © 2023 Cloudera, Inc. All rights reserved.
  • 23. 23 © 2023 Cloudera, Inc. All rights reserved. Cloudera Streams Processing - Community Edition • Kafka, KConnect, SMM, SR, Flink, and SSB in Docker • Runs in Docker • Try new features quickly • Develop applications locally ● Docker compose file of CSP to run from command line w/o any dependencies, including Flink, SQL Stream Builder, Kafka, Kafka Connect, Streams Messaging Manager and Schema Registry ○ $> docker compose up ● Licensed under the Cloudera Community License ● Unsupported ● Community Group Hub for CSP ● Find it on docs.cloudera.com under Applications
  • 24. Open Source Edition • Apache NiFi in Docker • Runs in Docker • Try new features quickly • Develop applications locally ● Docker NiFi ○ docker run --name nifi -p 8443:8443 -d -e SINGLE_USER_CREDENTIALS_USERNAME=admin -e SINGLE_USER_CREDENTIALS_PASSWORD=ctsBtRBKHRAx69EqUgh vvgEvjnaLjFEB apache/nifi:latest ● Licensed under the ASF License ● Unsupported https://hub.docker.com/r/apache/nifi
  • 26. © 2023 Cloudera, Inc. All rights reserved. 26 Future of Data - NYC / Princeton + Virtual @PaasDev https://www.meetup.com/futureofdata-princeton/ https://www.meetup.com/futureofdata-newyork/ From Big Data to AI to Streaming to LLM to Cloud to Analytics to NLP to Fast Data to Machine Learning to Microservices to ...
  • 28. 28 © 2023 Cloudera, Inc. All rights reserved. Streaming Resources • https://dzone.com/articles/real-time-stream-processing-with-hazelcast-an d-streamnative • https://flipstackweekly.com/ • https://www.datainmotion.dev/ • https://www.flankstack.dev/ • https://github.com/tspannhw • https://medium.com/@tspann • https://medium.com/@tspann/predictions-for-streaming-in-2023-ad4d739 5d714 • https://www.apachecon.com/acna2022/slides/04_Spann_Tim_Citizen_Str eaming_Engineer.pdf
  • 29. © 2023 Cloudera, Inc. All rights reserved. 29 FLaNK Stack Weekly This week in Apache NiFi, Apache Flink, Apache Kafka, Apache Spark, Apache Iceberg, Python, Java and Open Source friends. https://bit.ly/32dAJft
  • 31. LLM USE CASE Vector DB AI Model Unstructured file types Data in Motion on Cloudera Data Platform (CDP) Capture, process & distribute any data, anywhere Other enterprise data Open Data Lakehouse Materialized Views Structured Sources Applications/API’s Streams
  • 33. 33 © 2023 Cloudera, Inc. All rights reserved. TH N Y U