Structured Streaming in Spark

Digital Vidya
Digital VidyaDigital Vidya
Structured Streaming in Spark
Structured Streaming in
Spark
Vikram Agrawal
Qubole
About Me
● Pursued Computer Science and Engineering from IIT Delhi
● Co-founded a web conferencing solution company before joining Qubole
● In last 5 years at Qubole, I wore multiple hats and worked across stacks to
provide big-data solutions over cloud
● Currently leading the Streaming Team At Qubole
Who should watch this?
● Big Data Engineer (DevOps, Architect, Software, Engineer, Admin)
● Data Platform Manager
● Big Data Enthusiast (Consultant, Executive, Data User, Analyst)
How is streaming used in production?
● Identifying sessions based on user behavior from real time activity streams
● Anomaly and fraud detection: running ML predictions on data streaming in to
keep the model updated continuously as new data comes in
● Time-based window aggregations: using window functions to do associative
aggregations and run real time stats
Data Processing Architecture
Data Processing Architecture
Streaming Paradigm
● Stream In Stream out
○ Low Latency - How Low?
○ Complexity of Analytics
○ Volume - How high?
● Stream In Batch out
○ No Tight Latency Constraint
○ Higher Ingestion Rate
○ Aggregation/Data or Schema
Transformation/Data
Enrichment
○ Downstream ETL Operation
Why use Spark Streaming
● No ultra low Latency requirement
○ Processing time of few secs is acceptable
● Scalable and Mature Processing engine
● Higher Level API abstraction
○ Ease of Code Reuse from Batch jobs
○ Simple and Modular
● Vibrant Community
○ Active Development on new features
Spark’s Functionality
Structured Streaming - under the hood
● Abstractions of Repeated Queries
○ Data Streams as unbounded
Table
○ Streaming query is a batch-
like operation on this table
Structured Streaming - under the hood
● Query Planning & Execution
○ In Batch Execution, Planner creates code & memory optimized execution plan
○ For Streaming Query, Planner convert streaming Logical plans to a series of incremental
execution plan to process next chunk of data
DataFrame Logical Plan Planner Execution Plan
Planner
Incremental Execution 1
Incremental Execution 2
Incremental Execution 3
Programming Paradigm
Start with Spark Session
Specify Data Source, schema and
other options (create input df)
Write your incremental query to
generate output
Specify Data Sink and other
options to export your data
Val S= SparkSession.builder.appName("kafka
streaming Example").getOrCreate()
val ds = S.readStream.format("kafka")
.option("kafka.bootstrap.servers", brokers)
option("subscribe",
topics).load().selectExpr("CAST(key AS STRING)",
"CAST(value AS STRING)").as[(String, String)
val c= ds.groupBy("value").count()
c.writeStream.queryName("aggregates").format("
memory").outputMode("complete").start()
Productionizing Streaming Application
● Monitoring
○ Throughput
○ Latency
○ Time Lag
● Fault Tolerance
○ Checkpointing
○ Exactly Once or At Least Once
Q&A
Structured Streaming in Spark
1 de 16

Recomendados

Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018 por
Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018
Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018Bowen Li
110 visualizações18 slides
Flink Forward Berlin 2018: Raj Subramani - "A streaming Quantitative Analytic... por
Flink Forward Berlin 2018: Raj Subramani - "A streaming Quantitative Analytic...Flink Forward Berlin 2018: Raj Subramani - "A streaming Quantitative Analytic...
Flink Forward Berlin 2018: Raj Subramani - "A streaming Quantitative Analytic...Flink Forward
726 visualizações40 slides
SAIS2018 - Fact Store At Netflix Scale por
SAIS2018 - Fact Store At Netflix ScaleSAIS2018 - Fact Store At Netflix Scale
SAIS2018 - Fact Store At Netflix ScaleNitin S
106 visualizações19 slides
Flink Forward Berlin 2018: Timo Walther - "Flink SQL in Action" por
Flink Forward Berlin 2018: Timo Walther - "Flink SQL in Action"Flink Forward Berlin 2018: Timo Walther - "Flink SQL in Action"
Flink Forward Berlin 2018: Timo Walther - "Flink SQL in Action"Flink Forward
1.5K visualizações26 slides
Grafana 7.0 por
Grafana 7.0Grafana 7.0
Grafana 7.0Juraj Hantak
333 visualizações14 slides
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedIn por
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedInGrokking TechTalk #29: Building Realtime Metrics Platform at LinkedIn
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedInGrokking VN
642 visualizações41 slides

Mais conteúdo relacionado

Mais procurados

Flink Forward Berlin 2018: Brian Wolfe - "Upshot: distributed tracing using F... por
Flink Forward Berlin 2018: Brian Wolfe - "Upshot: distributed tracing using F...Flink Forward Berlin 2018: Brian Wolfe - "Upshot: distributed tracing using F...
Flink Forward Berlin 2018: Brian Wolfe - "Upshot: distributed tracing using F...Flink Forward
1.4K visualizações48 slides
Flink Forward Berlin 2018: Aljoscha Krettek & Till Rohrmann - Keynote: "A Yea... por
Flink Forward Berlin 2018: Aljoscha Krettek & Till Rohrmann - Keynote: "A Yea...Flink Forward Berlin 2018: Aljoscha Krettek & Till Rohrmann - Keynote: "A Yea...
Flink Forward Berlin 2018: Aljoscha Krettek & Till Rohrmann - Keynote: "A Yea...Flink Forward
729 visualizações39 slides
Flink Forward Berlin 2018: Oleksandr Nitavskyi - "Data lossless event time st... por
Flink Forward Berlin 2018: Oleksandr Nitavskyi - "Data lossless event time st...Flink Forward Berlin 2018: Oleksandr Nitavskyi - "Data lossless event time st...
Flink Forward Berlin 2018: Oleksandr Nitavskyi - "Data lossless event time st...Flink Forward
404 visualizações24 slides
Fall in Love with Graphs and Metrics using Grafana por
Fall in Love with Graphs and Metrics using GrafanaFall in Love with Graphs and Metrics using Grafana
Fall in Love with Graphs and Metrics using Grafanatorkelo
11.5K visualizações46 slides
The Future of Real-Time in Spark por
The Future of Real-Time in SparkThe Future of Real-Time in Spark
The Future of Real-Time in SparkReynold Xin
26.8K visualizações30 slides
Flink Forward Berlin 2018: Stephan Ewen - Keynote: "Unlocking the next wave o... por
Flink Forward Berlin 2018: Stephan Ewen - Keynote: "Unlocking the next wave o...Flink Forward Berlin 2018: Stephan Ewen - Keynote: "Unlocking the next wave o...
Flink Forward Berlin 2018: Stephan Ewen - Keynote: "Unlocking the next wave o...Flink Forward
435 visualizações29 slides

Mais procurados(20)

Flink Forward Berlin 2018: Brian Wolfe - "Upshot: distributed tracing using F... por Flink Forward
Flink Forward Berlin 2018: Brian Wolfe - "Upshot: distributed tracing using F...Flink Forward Berlin 2018: Brian Wolfe - "Upshot: distributed tracing using F...
Flink Forward Berlin 2018: Brian Wolfe - "Upshot: distributed tracing using F...
Flink Forward1.4K visualizações
Flink Forward Berlin 2018: Aljoscha Krettek & Till Rohrmann - Keynote: "A Yea... por Flink Forward
Flink Forward Berlin 2018: Aljoscha Krettek & Till Rohrmann - Keynote: "A Yea...Flink Forward Berlin 2018: Aljoscha Krettek & Till Rohrmann - Keynote: "A Yea...
Flink Forward Berlin 2018: Aljoscha Krettek & Till Rohrmann - Keynote: "A Yea...
Flink Forward729 visualizações
Flink Forward Berlin 2018: Oleksandr Nitavskyi - "Data lossless event time st... por Flink Forward
Flink Forward Berlin 2018: Oleksandr Nitavskyi - "Data lossless event time st...Flink Forward Berlin 2018: Oleksandr Nitavskyi - "Data lossless event time st...
Flink Forward Berlin 2018: Oleksandr Nitavskyi - "Data lossless event time st...
Flink Forward404 visualizações
Fall in Love with Graphs and Metrics using Grafana por torkelo
Fall in Love with Graphs and Metrics using GrafanaFall in Love with Graphs and Metrics using Grafana
Fall in Love with Graphs and Metrics using Grafana
torkelo11.5K visualizações
The Future of Real-Time in Spark por Reynold Xin
The Future of Real-Time in SparkThe Future of Real-Time in Spark
The Future of Real-Time in Spark
Reynold Xin26.8K visualizações
Flink Forward Berlin 2018: Stephan Ewen - Keynote: "Unlocking the next wave o... por Flink Forward
Flink Forward Berlin 2018: Stephan Ewen - Keynote: "Unlocking the next wave o...Flink Forward Berlin 2018: Stephan Ewen - Keynote: "Unlocking the next wave o...
Flink Forward Berlin 2018: Stephan Ewen - Keynote: "Unlocking the next wave o...
Flink Forward435 visualizações
Flink Forward Berlin 2018: Krzysztof Zarzycki & Alexey Brodovshuk - "Assistin... por Flink Forward
Flink Forward Berlin 2018: Krzysztof Zarzycki & Alexey Brodovshuk - "Assistin...Flink Forward Berlin 2018: Krzysztof Zarzycki & Alexey Brodovshuk - "Assistin...
Flink Forward Berlin 2018: Krzysztof Zarzycki & Alexey Brodovshuk - "Assistin...
Flink Forward1.5K visualizações
Flink Forward Berlin 2018: Xiaowei Jiang - Keynote: "Unified Engine for Data ... por Flink Forward
Flink Forward Berlin 2018: Xiaowei Jiang - Keynote: "Unified Engine for Data ...Flink Forward Berlin 2018: Xiaowei Jiang - Keynote: "Unified Engine for Data ...
Flink Forward Berlin 2018: Xiaowei Jiang - Keynote: "Unified Engine for Data ...
Flink Forward1.1K visualizações
BDX 2016 - Kevin lyons & yakir buskilla @ eXelate por Ido Shilon
BDX 2016 - Kevin lyons & yakir buskilla  @ eXelate BDX 2016 - Kevin lyons & yakir buskilla  @ eXelate
BDX 2016 - Kevin lyons & yakir buskilla @ eXelate
Ido Shilon486 visualizações
GraphQL API on a Serverless Environment por Itai Yaffe
GraphQL API on a Serverless EnvironmentGraphQL API on a Serverless Environment
GraphQL API on a Serverless Environment
Itai Yaffe235 visualizações
Logging in The World of DevOps por DevOps Indonesia
Logging in The World of DevOps Logging in The World of DevOps
Logging in The World of DevOps
DevOps Indonesia172 visualizações
How We Migrate PBs Data from Beijing to Shanghai por Elmer Brown
How We Migrate PBs Data from Beijing to ShanghaiHow We Migrate PBs Data from Beijing to Shanghai
How We Migrate PBs Data from Beijing to Shanghai
Elmer Brown4.6K visualizações
Bus ticket management system por Abu Kaisar
Bus ticket management systemBus ticket management system
Bus ticket management system
Abu Kaisar 306 visualizações
Storing State Forever: Why It Can Be Good For Your Analytics por Yaroslav Tkachenko
Storing State Forever: Why It Can Be Good For Your AnalyticsStoring State Forever: Why It Can Be Good For Your Analytics
Storing State Forever: Why It Can Be Good For Your Analytics
Yaroslav Tkachenko485 visualizações
Weavework Flagger Demo- AWS Container Day 2019 Barcelona por Amazon Web Services
Weavework Flagger Demo- AWS Container Day 2019 BarcelonaWeavework Flagger Demo- AWS Container Day 2019 Barcelona
Weavework Flagger Demo- AWS Container Day 2019 Barcelona
Amazon Web Services1.9K visualizações
Streaming sql and druid por arupmalakar
Streaming sql and druid Streaming sql and druid
Streaming sql and druid
arupmalakar838 visualizações
Migrating batch ETLs to streaming Flink por William Saar
Migrating batch ETLs to streaming FlinkMigrating batch ETLs to streaming Flink
Migrating batch ETLs to streaming Flink
William Saar898 visualizações
Netflix Recommendations Feature Engineering with Time Travel por Faisal Siddiqi
Netflix Recommendations Feature Engineering with Time TravelNetflix Recommendations Feature Engineering with Time Travel
Netflix Recommendations Feature Engineering with Time Travel
Faisal Siddiqi3.5K visualizações
Spline: Data Lineage For Spark Structured Streaming por Vaclav Kosar
Spline: Data Lineage For Spark Structured StreamingSpline: Data Lineage For Spark Structured Streaming
Spline: Data Lineage For Spark Structured Streaming
Vaclav Kosar1.1K visualizações

Similar a Structured Streaming in Spark

Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma por
Sparking up Data Engineering: Spark Summit East talk by Rohan SharmaSparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Sparking up Data Engineering: Spark Summit East talk by Rohan SharmaSpark Summit
1.1K visualizações17 slides
Conquering the Lambda architecture in LinkedIn metrics platform with Apache C... por
Conquering the Lambda architecture in LinkedIn metrics platform with Apache C...Conquering the Lambda architecture in LinkedIn metrics platform with Apache C...
Conquering the Lambda architecture in LinkedIn metrics platform with Apache C...Khai Tran
2.9K visualizações32 slides
XStream: stream processing platform at facebook por
XStream:  stream processing platform at facebookXStream:  stream processing platform at facebook
XStream: stream processing platform at facebookAniket Mokashi
454 visualizações22 slides
Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach... por
Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...
Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...Flink Forward
1.6K visualizações30 slides
Bootstrapping state in Apache Flink por
Bootstrapping state in Apache FlinkBootstrapping state in Apache Flink
Bootstrapping state in Apache FlinkDataWorks Summit
1.7K visualizações30 slides
DSDT Meetup Nov 2017 por
DSDT Meetup Nov 2017DSDT Meetup Nov 2017
DSDT Meetup Nov 2017DSDT_MTL
55 visualizações55 slides

Similar a Structured Streaming in Spark(20)

Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma por Spark Summit
Sparking up Data Engineering: Spark Summit East talk by Rohan SharmaSparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Spark Summit1.1K visualizações
Conquering the Lambda architecture in LinkedIn metrics platform with Apache C... por Khai Tran
Conquering the Lambda architecture in LinkedIn metrics platform with Apache C...Conquering the Lambda architecture in LinkedIn metrics platform with Apache C...
Conquering the Lambda architecture in LinkedIn metrics platform with Apache C...
Khai Tran2.9K visualizações
XStream: stream processing platform at facebook por Aniket Mokashi
XStream:  stream processing platform at facebookXStream:  stream processing platform at facebook
XStream: stream processing platform at facebook
Aniket Mokashi454 visualizações
Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach... por Flink Forward
Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...
Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...
Flink Forward1.6K visualizações
Bootstrapping state in Apache Flink por DataWorks Summit
Bootstrapping state in Apache FlinkBootstrapping state in Apache Flink
Bootstrapping state in Apache Flink
DataWorks Summit1.7K visualizações
DSDT Meetup Nov 2017 por DSDT_MTL
DSDT Meetup Nov 2017DSDT Meetup Nov 2017
DSDT Meetup Nov 2017
DSDT_MTL55 visualizações
Dsdt meetup 2017 11-21 por JDA Labs MTL
Dsdt meetup 2017 11-21Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21
JDA Labs MTL406 visualizações
OSMC 2023 | What’s new with Grafana Labs’s Open Source Observability stack by... por NETWAYS
OSMC 2023 | What’s new with Grafana Labs’s Open Source Observability stack by...OSMC 2023 | What’s new with Grafana Labs’s Open Source Observability stack by...
OSMC 2023 | What’s new with Grafana Labs’s Open Source Observability stack by...
NETWAYS19 visualizações
Database automation guide - Oracle Community Tour LATAM 2023 por Nelson Calero
Database automation guide - Oracle Community Tour LATAM 2023Database automation guide - Oracle Community Tour LATAM 2023
Database automation guide - Oracle Community Tour LATAM 2023
Nelson Calero15 visualizações
Google Cloud Dataflow por Alex Van Boxel
Google Cloud DataflowGoogle Cloud Dataflow
Google Cloud Dataflow
Alex Van Boxel7.6K visualizações
Machine learning and big data @ uber a tale of two systems por Zhenxiao Luo
Machine learning and big data @ uber a tale of two systemsMachine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systems
Zhenxiao Luo2.2K visualizações
Scaling up uber's real time data analytics por Xiang Fu
Scaling up uber's real time data analyticsScaling up uber's real time data analytics
Scaling up uber's real time data analytics
Xiang Fu758 visualizações
Netflix Architecture and Open Source por All Things Open
Netflix Architecture and Open SourceNetflix Architecture and Open Source
Netflix Architecture and Open Source
All Things Open735 visualizações
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber por confluent
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ UberKafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
confluent2.6K visualizações
Introduction to Flink Streaming por datamantra
Introduction to Flink StreamingIntroduction to Flink Streaming
Introduction to Flink Streaming
datamantra1.3K visualizações
Triangle Devops Meetup 10/2015 por aspyker
Triangle Devops Meetup 10/2015Triangle Devops Meetup 10/2015
Triangle Devops Meetup 10/2015
aspyker1.1K visualizações
Apache Beam and Google Cloud Dataflow - IDG - final por Sub Szabolcs Feczak
Apache Beam and Google Cloud Dataflow - IDG - finalApache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - final
Sub Szabolcs Feczak3.2K visualizações
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ... por Dataconomy Media
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Dataconomy Media1.2K visualizações
Big Data Berlin v8.0 Stream Processing with Apache Apex por Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex
Apache Apex1.1K visualizações
Intro to Apache Apex - Next Gen Platform for Ingest and Transform por Apache Apex
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Apache Apex1.2K visualizações

Mais de Digital Vidya

Emerging Trends in Marketing-Role of AI & Data Science por
Emerging Trends in Marketing-Role of AI & Data ScienceEmerging Trends in Marketing-Role of AI & Data Science
Emerging Trends in Marketing-Role of AI & Data ScienceDigital Vidya
1.7K visualizações9 slides
Digital Marketing Beyond Facebook & Google por
Digital Marketing Beyond Facebook & GoogleDigital Marketing Beyond Facebook & Google
Digital Marketing Beyond Facebook & GoogleDigital Vidya
1.2K visualizações54 slides
Making Money Out of Data por
Making Money Out of DataMaking Money Out of Data
Making Money Out of DataDigital Vidya
810 visualizações19 slides
Say Yes To No SQL por
Say Yes To No SQLSay Yes To No SQL
Say Yes To No SQLDigital Vidya
694 visualizações19 slides
Persuasion Strategies That Work Building Influence To Open Up Your Revenue St... por
Persuasion Strategies That Work Building Influence To Open Up Your Revenue St...Persuasion Strategies That Work Building Influence To Open Up Your Revenue St...
Persuasion Strategies That Work Building Influence To Open Up Your Revenue St...Digital Vidya
734 visualizações15 slides
How To Set-up An SEO Agency From Scratch As A Newbie por
How To Set-up An SEO Agency From Scratch As A NewbieHow To Set-up An SEO Agency From Scratch As A Newbie
How To Set-up An SEO Agency From Scratch As A NewbieDigital Vidya
795 visualizações12 slides

Mais de Digital Vidya(20)

Emerging Trends in Marketing-Role of AI & Data Science por Digital Vidya
Emerging Trends in Marketing-Role of AI & Data ScienceEmerging Trends in Marketing-Role of AI & Data Science
Emerging Trends in Marketing-Role of AI & Data Science
Digital Vidya1.7K visualizações
Digital Marketing Beyond Facebook & Google por Digital Vidya
Digital Marketing Beyond Facebook & GoogleDigital Marketing Beyond Facebook & Google
Digital Marketing Beyond Facebook & Google
Digital Vidya1.2K visualizações
Making Money Out of Data por Digital Vidya
Making Money Out of DataMaking Money Out of Data
Making Money Out of Data
Digital Vidya810 visualizações
Say Yes To No SQL por Digital Vidya
Say Yes To No SQLSay Yes To No SQL
Say Yes To No SQL
Digital Vidya694 visualizações
Persuasion Strategies That Work Building Influence To Open Up Your Revenue St... por Digital Vidya
Persuasion Strategies That Work Building Influence To Open Up Your Revenue St...Persuasion Strategies That Work Building Influence To Open Up Your Revenue St...
Persuasion Strategies That Work Building Influence To Open Up Your Revenue St...
Digital Vidya734 visualizações
How To Set-up An SEO Agency From Scratch As A Newbie por Digital Vidya
How To Set-up An SEO Agency From Scratch As A NewbieHow To Set-up An SEO Agency From Scratch As A Newbie
How To Set-up An SEO Agency From Scratch As A Newbie
Digital Vidya795 visualizações
Lifecycle of a Data Science Project por Digital Vidya
Lifecycle of a Data Science ProjectLifecycle of a Data Science Project
Lifecycle of a Data Science Project
Digital Vidya1.2K visualizações
7 B2B Marketing Trends for Driving Growth por Digital Vidya
7 B2B Marketing Trends for Driving Growth7 B2B Marketing Trends for Driving Growth
7 B2B Marketing Trends for Driving Growth
Digital Vidya1.1K visualizações
Social Video Analytics: From Demography to Psychography of User Behaviour por Digital Vidya
Social Video Analytics: From Demography to Psychography of User BehaviourSocial Video Analytics: From Demography to Psychography of User Behaviour
Social Video Analytics: From Demography to Psychography of User Behaviour
Digital Vidya834 visualizações
AIRflow at Scale por Digital Vidya
AIRflow at ScaleAIRflow at Scale
AIRflow at Scale
Digital Vidya1K visualizações
How to Use Marketing Automation to Convert More Leads to Sales por Digital Vidya
How to Use Marketing Automation to Convert More Leads to SalesHow to Use Marketing Automation to Convert More Leads to Sales
How to Use Marketing Automation to Convert More Leads to Sales
Digital Vidya968 visualizações
Native Advertising: Changing Digital Advertising Landscape por Digital Vidya
Native Advertising: Changing Digital Advertising LandscapeNative Advertising: Changing Digital Advertising Landscape
Native Advertising: Changing Digital Advertising Landscape
Digital Vidya865 visualizações
Personal Branding Using Social Media por Digital Vidya
Personal Branding Using Social MediaPersonal Branding Using Social Media
Personal Branding Using Social Media
Digital Vidya1.1K visualizações
Anomaly Detection Using Machine Learning In Industrial IoT por Digital Vidya
Anomaly Detection Using Machine Learning In Industrial IoTAnomaly Detection Using Machine Learning In Industrial IoT
Anomaly Detection Using Machine Learning In Industrial IoT
Digital Vidya1.2K visualizações
Community Development with Social Media por Digital Vidya
Community Development with Social MediaCommunity Development with Social Media
Community Development with Social Media
Digital Vidya1K visualizações
Framework of Digital Media Marketing in India por Digital Vidya
Framework of Digital Media Marketing in IndiaFramework of Digital Media Marketing in India
Framework of Digital Media Marketing in India
Digital Vidya921 visualizações
The Secret to Search Engine Marketing Success in 2018 por Digital Vidya
The Secret to Search Engine Marketing Success in 2018The Secret to Search Engine Marketing Success in 2018
The Secret to Search Engine Marketing Success in 2018
Digital Vidya677 visualizações
People Centric Marketing - Create Impact by Putting People First por Digital Vidya
People Centric Marketing - Create Impact by Putting People First People Centric Marketing - Create Impact by Putting People First
People Centric Marketing - Create Impact by Putting People First
Digital Vidya1.1K visualizações
Going Global? Key Steps to Expanding Your Business Globally por Digital Vidya
Going Global? Key Steps to Expanding Your Business GloballyGoing Global? Key Steps to Expanding Your Business Globally
Going Global? Key Steps to Expanding Your Business Globally
Digital Vidya948 visualizações
How to Optimize your Online Presence for 6X Growth in Sales? por Digital Vidya
 How to Optimize your Online Presence for 6X Growth in Sales? How to Optimize your Online Presence for 6X Growth in Sales?
How to Optimize your Online Presence for 6X Growth in Sales?
Digital Vidya978 visualizações

Último

MIXING OF PHARMACEUTICALS.pptx por
MIXING OF PHARMACEUTICALS.pptxMIXING OF PHARMACEUTICALS.pptx
MIXING OF PHARMACEUTICALS.pptxAnupkumar Sharma
121 visualizações35 slides
Career Building in AI - Technologies, Trends and Opportunities por
Career Building in AI - Technologies, Trends and OpportunitiesCareer Building in AI - Technologies, Trends and Opportunities
Career Building in AI - Technologies, Trends and OpportunitiesWebStackAcademy
45 visualizações44 slides
ICS3211_lecture 09_2023.pdf por
ICS3211_lecture 09_2023.pdfICS3211_lecture 09_2023.pdf
ICS3211_lecture 09_2023.pdfVanessa Camilleri
141 visualizações10 slides
Papal.pdf por
Papal.pdfPapal.pdf
Papal.pdfMariaKenney3
68 visualizações24 slides
12.5.23 Poverty and Precarity.pptx por
12.5.23 Poverty and Precarity.pptx12.5.23 Poverty and Precarity.pptx
12.5.23 Poverty and Precarity.pptxmary850239
381 visualizações30 slides
Java Simplified: Understanding Programming Basics por
Java Simplified: Understanding Programming BasicsJava Simplified: Understanding Programming Basics
Java Simplified: Understanding Programming BasicsAkshaj Vadakkath Joshy
653 visualizações155 slides

Último(20)

MIXING OF PHARMACEUTICALS.pptx por Anupkumar Sharma
MIXING OF PHARMACEUTICALS.pptxMIXING OF PHARMACEUTICALS.pptx
MIXING OF PHARMACEUTICALS.pptx
Anupkumar Sharma121 visualizações
Career Building in AI - Technologies, Trends and Opportunities por WebStackAcademy
Career Building in AI - Technologies, Trends and OpportunitiesCareer Building in AI - Technologies, Trends and Opportunities
Career Building in AI - Technologies, Trends and Opportunities
WebStackAcademy45 visualizações
ICS3211_lecture 09_2023.pdf por Vanessa Camilleri
ICS3211_lecture 09_2023.pdfICS3211_lecture 09_2023.pdf
ICS3211_lecture 09_2023.pdf
Vanessa Camilleri141 visualizações
Papal.pdf por MariaKenney3
Papal.pdfPapal.pdf
Papal.pdf
MariaKenney368 visualizações
12.5.23 Poverty and Precarity.pptx por mary850239
12.5.23 Poverty and Precarity.pptx12.5.23 Poverty and Precarity.pptx
12.5.23 Poverty and Precarity.pptx
mary850239381 visualizações
Java Simplified: Understanding Programming Basics por Akshaj Vadakkath Joshy
Java Simplified: Understanding Programming BasicsJava Simplified: Understanding Programming Basics
Java Simplified: Understanding Programming Basics
Akshaj Vadakkath Joshy653 visualizações
ANGULARJS.pdf por ArthyR3
ANGULARJS.pdfANGULARJS.pdf
ANGULARJS.pdf
ArthyR351 visualizações
BUSINESS ETHICS MODULE 1 UNIT I_A.pdf por Dr Vijay Vishwakarma
BUSINESS ETHICS MODULE 1 UNIT I_A.pdfBUSINESS ETHICS MODULE 1 UNIT I_A.pdf
BUSINESS ETHICS MODULE 1 UNIT I_A.pdf
Dr Vijay Vishwakarma40 visualizações
Education of marginalized and socially disadvantages segments.pptx por GarimaBhati5
Education of marginalized and socially disadvantages segments.pptxEducation of marginalized and socially disadvantages segments.pptx
Education of marginalized and socially disadvantages segments.pptx
GarimaBhati543 visualizações
JRN 362 - Lecture Twenty-Three (Epilogue) por Rich Hanley
JRN 362 - Lecture Twenty-Three (Epilogue)JRN 362 - Lecture Twenty-Three (Epilogue)
JRN 362 - Lecture Twenty-Three (Epilogue)
Rich Hanley41 visualizações
EILO EXCURSION PROGRAMME 2023 por info33492
EILO EXCURSION PROGRAMME 2023EILO EXCURSION PROGRAMME 2023
EILO EXCURSION PROGRAMME 2023
info33492202 visualizações
Class 9 lesson plans por TARIQ KHAN
Class 9 lesson plansClass 9 lesson plans
Class 9 lesson plans
TARIQ KHAN82 visualizações
Payment Integration using Braintree Connector | MuleSoft Mysore Meetup #37 por MysoreMuleSoftMeetup
Payment Integration using Braintree Connector | MuleSoft Mysore Meetup #37Payment Integration using Braintree Connector | MuleSoft Mysore Meetup #37
Payment Integration using Braintree Connector | MuleSoft Mysore Meetup #37
MysoreMuleSoftMeetup50 visualizações
Jibachha publishing Textbook.docx por DrJibachhaSahVetphys
Jibachha publishing Textbook.docxJibachha publishing Textbook.docx
Jibachha publishing Textbook.docx
DrJibachhaSahVetphys54 visualizações
Gross Anatomy of the Liver por obaje godwin sunday
Gross Anatomy of the LiverGross Anatomy of the Liver
Gross Anatomy of the Liver
obaje godwin sunday77 visualizações
Retail Store Scavenger Hunt.pptx por jmurphy154
Retail Store Scavenger Hunt.pptxRetail Store Scavenger Hunt.pptx
Retail Store Scavenger Hunt.pptx
jmurphy15452 visualizações
STRATEGIC MANAGEMENT MODULE 1_UNIT1 _UNIT2.pdf por Dr Vijay Vishwakarma
STRATEGIC MANAGEMENT MODULE 1_UNIT1 _UNIT2.pdfSTRATEGIC MANAGEMENT MODULE 1_UNIT1 _UNIT2.pdf
STRATEGIC MANAGEMENT MODULE 1_UNIT1 _UNIT2.pdf
Dr Vijay Vishwakarma130 visualizações
JQUERY.pdf por ArthyR3
JQUERY.pdfJQUERY.pdf
JQUERY.pdf
ArthyR3105 visualizações

Structured Streaming in Spark

  • 3. About Me ● Pursued Computer Science and Engineering from IIT Delhi ● Co-founded a web conferencing solution company before joining Qubole ● In last 5 years at Qubole, I wore multiple hats and worked across stacks to provide big-data solutions over cloud ● Currently leading the Streaming Team At Qubole
  • 4. Who should watch this? ● Big Data Engineer (DevOps, Architect, Software, Engineer, Admin) ● Data Platform Manager ● Big Data Enthusiast (Consultant, Executive, Data User, Analyst)
  • 5. How is streaming used in production? ● Identifying sessions based on user behavior from real time activity streams ● Anomaly and fraud detection: running ML predictions on data streaming in to keep the model updated continuously as new data comes in ● Time-based window aggregations: using window functions to do associative aggregations and run real time stats
  • 8. Streaming Paradigm ● Stream In Stream out ○ Low Latency - How Low? ○ Complexity of Analytics ○ Volume - How high? ● Stream In Batch out ○ No Tight Latency Constraint ○ Higher Ingestion Rate ○ Aggregation/Data or Schema Transformation/Data Enrichment ○ Downstream ETL Operation
  • 9. Why use Spark Streaming ● No ultra low Latency requirement ○ Processing time of few secs is acceptable ● Scalable and Mature Processing engine ● Higher Level API abstraction ○ Ease of Code Reuse from Batch jobs ○ Simple and Modular ● Vibrant Community ○ Active Development on new features
  • 11. Structured Streaming - under the hood ● Abstractions of Repeated Queries ○ Data Streams as unbounded Table ○ Streaming query is a batch- like operation on this table
  • 12. Structured Streaming - under the hood ● Query Planning & Execution ○ In Batch Execution, Planner creates code & memory optimized execution plan ○ For Streaming Query, Planner convert streaming Logical plans to a series of incremental execution plan to process next chunk of data DataFrame Logical Plan Planner Execution Plan Planner Incremental Execution 1 Incremental Execution 2 Incremental Execution 3
  • 13. Programming Paradigm Start with Spark Session Specify Data Source, schema and other options (create input df) Write your incremental query to generate output Specify Data Sink and other options to export your data Val S= SparkSession.builder.appName("kafka streaming Example").getOrCreate() val ds = S.readStream.format("kafka") .option("kafka.bootstrap.servers", brokers) option("subscribe", topics).load().selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)").as[(String, String) val c= ds.groupBy("value").count() c.writeStream.queryName("aggregates").format(" memory").outputMode("complete").start()
  • 14. Productionizing Streaming Application ● Monitoring ○ Throughput ○ Latency ○ Time Lag ● Fault Tolerance ○ Checkpointing ○ Exactly Once or At Least Once
  • 15. Q&A