Introduction To Flink

•

1 gostou•340 visualizações

Apache Flink is an open source platform which is a streaming data flow engine that provides communication, fault-tolerance, and data-distribution for distributed computations over data streams. Flink is a top level project of Apache. Flink is a scalable data analytics framework that is fully compatible to Hadoop. Flink can execute both stream processing and batch processing easily.

Tecnologia

Presented By:
Kundan Kumar
Software Consultant
An introduction to
Apache Flink: 4G of
Big Data

Lack of etiquette and manners is a huge turn oﬀ.
KnolX Etiquettes
Punctuality
Respect Knolx session timings, you
are requested not to join sessions
after a 5 minutes threshold post
the session start time.
Feedback
Make sure to submit a constructive
feedback for all sessions as it is
very helpful for the presenter.
Mute
Be on mute until you have
questions or concerns.
Avoid Disturbance
Avoid unwanted chit chat during
the session.

Agenda
01 Big Data evolution
02
Introduction to Flink
03
Features of Flink
Architecture of Flink
Anatomy of a Flink program
Demo
04
05
06

Big Data Evolution
Problems with Big Data:
● Storing huge and exponentially growing datasets.
● Processing of huge data datasets having complex structure.
● 3v’s of Big Data - Volume, Variety, Velocity

Continue..
● At early 2000, Big Data era started with multiple frameworks focusing on
specifying Big Data problem.

Continue..
● A unified platform that alone can handle various Big Data problem:
➢ Batch processing
➢ Stream processing
➢ Graph processing
➢ Iterative processing
● A unified platform must have following characteristics to solve Big
Data Problem:
➢ Distributed/ parallel computation
➢ Fault tolerance
➢ Ease of use (developer friendly API’s)
➢ Powerful predefined operators/functions(Like Join, filter)
➢ Fast

Apache Spark (3G Big Data Framework)
● Spark is a lightning-fast cluster computing engine that is 100 times faster than
Hadoop in running applications in memory
● Apache Spark is best known for its in-memory computing capabilities that
deliver high-speed processing.
➢ Problem
● Process data streams in micro batches and not in real time.
● High throughput but medium latency in some use cases.

Introduction to Flink
● Apache Flink is a Big Data framework and distributed processing engine for
stateful computations over unbounded and bounded data streams.
● Flink is based on the streaming first principle which means it is real streaming
processing engine Flink considers batch processing as a special case of
streaming
● Flink has been designed to run in all common cluster environments, perform
computations at in-memory speed and at any scale.

➢ A Flink application may consume real-time data from streaming sources such as
message queues or distributed logs, like Apache Kafka or Kinesis.
➢ Flink can also consume bounded, historic data from a variety of data sources.
➢ The streams of results being produced by a Flink application can be sent to a wide
variety of systems that can be connected as sinks

➢ Programs in Flink are inherently parallel and distributed.
➢ During execution, a stream has one or more stream partitions, and each
operator has one or more operator subtasks.

➢ Flink facilitate stateful operations.
➢ Current handling event can depend on the accumulated effect of all the events
that came before it.
➢ The set of parallel instances of a stateful operator is effectively a sharded
key-value store. Each parallel instance is responsible for handling events for a
specific group of keys, and the state for those keys is kept locally.

Flink Architecture
➢ Flink 1.X's architecture consists of various components such as deploy,
core processing, and APIs.
➢ Flink has a layered architecture and each component is a part of a
specific layer.
➢ Each layer is built on top of the others for clear abstraction.

Flinks Distributed Execution
➢ Flink is based on master slave architecture.
➢ Various processes take part in the Flink’s program execution, namely
Job Manager, Task Manager, and Job Client.

Flink Features
➢ High performance
➢ Exactly-once stateful computation
➢ Fault tolerance
➢ Memory management
➢ Optimizer
➢ Unified platform for stream and batch
➢ Rich Libraries

Mais conteúdo relacionado

Mais procurados

Apache Flink Worst PracticesKonstantin Knauf

Building a Streaming Microservice Architecture: with Apache Spark Structured ...Databricks

Radical Speed for SQL Queries on Databricks: Photon Under the HoodDatabricks

One sink to rule them all: Introducing the new Async SinkFlink Forward

Unified Stream and Batch Processing with Apache FlinkDataWorks Summit/Hadoop Summit

Apache Spark on K8S Best Practice and Performance in the CloudDatabricks

Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudNoritaka Sekiyama

Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Databricks

Hive + Tez: A Performance Deep DiveDataWorks Summit

Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Flink Forward

Apache flinkpranay kumar

Apache Flink Stream ProcessingSuneel Marthi

Running Apache Spark on Kubernetes: Best Practices and PitfallsDatabricks

Real-time Stream Processing with Apache FlinkDataWorks Summit

Kafka 101 and Developer Best Practicesconfluent

How Uber scaled its Real Time Infrastructure to Trillion events per dayDataWorks Summit

Apache AirflowSumit Maheshwari

Tame the small files problem and optimize data layout for streaming ingestion...Flink Forward

Stream processing with Apache Flink (Timo Walther - Ververica)KafkaZone

Flink powered stream processing platform at PinterestFlink Forward

Mais procurados (20)

Apache Flink Worst Practices

Building a Streaming Microservice Architecture: with Apache Spark Structured ...

Radical Speed for SQL Queries on Databricks: Photon Under the Hood

One sink to rule them all: Introducing the new Async Sink

Unified Stream and Batch Processing with Apache Flink

Apache Spark on K8S Best Practice and Performance in the Cloud

Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud

Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...

Hive + Tez: A Performance Deep Dive

Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...

Apache flink

Apache Flink Stream Processing

Running Apache Spark on Kubernetes: Best Practices and Pitfalls

Real-time Stream Processing with Apache Flink

Kafka 101 and Developer Best Practices

How Uber scaled its Real Time Infrastructure to Trillion events per day

Apache Airflow

Tame the small files problem and optimize data layout for streaming ingestion...

Stream processing with Apache Flink (Timo Walther - Ververica)

Flink powered stream processing platform at Pinterest

Semelhante a Introduction To Flink

Apache FlinkMike Frampton

Why Serverless Flink Matters - Blazing Fast Stream Processing Made ScalableHostedbyConfluent

The FN Project by Maximilian JergHarald Schmaldienst

Kostas Tzoumas - Apache Flink®: State of the Union and What's NextVerverica

Flink forward-2017-netflix keystones-paasMonal Daxini

How Tencent Applies Apache Pulsar to Apache InLong —— A Streaming Data Integr...StreamNative

O365Con19 - Things I've Learned While Building a Product on SharePoint Modern...NCCOMMS

Robust stream processing with Apache FlinkAljoscha Krettek

Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUpJosé Román Martín Gil

Apache flinkJanu Jahnavi

Apache Flink Training Workshop @ HadoopCon2016 - #1 System OverviewApache Flink Taiwan User Group

Towards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen LiBowen Li

Apache Fink 1.0: A New Era for Real-World Streaming AnalyticsSlim Baltagi

Apache Flink Online TrainingLearntek1

Serverless design with Fn projectSiva Rama Krishna Chunduru

Near real-time anomaly detection at Lyftmarkgrover

Getting_Started_with_Salesforce_Flow_for_Developers_(In-person_event)_.pptxShams Pirzada

Workshop híbrido: Stream Processing con Flinkconfluent

Apache flinkJanu Jahnavi

Semelhante a Introduction To Flink (20)

Apache Flink

Why Serverless Flink Matters - Blazing Fast Stream Processing Made Scalable

The FN Project by Maximilian Jerg

Kostas Tzoumas - Apache Flink®: State of the Union and What's Next

Flink forward-2017-netflix keystones-paas

How Tencent Applies Apache Pulsar to Apache InLong —— A Streaming Data Integr...

O365Con19 - Things I've Learned While Building a Product on SharePoint Modern...

Robust stream processing with Apache Flink

Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp

Apache flink

Apache Flink Training Workshop @ HadoopCon2016 - #1 System Overview

Towards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen Li

Apache Fink 1.0: A New Era for Real-World Streaming Analytics

Apache Flink Online Training

Serverless design with Fn project

Near real-time anomaly detection at Lyft

Getting_Started_with_Salesforce_Flow_for_Developers_(In-person_event)_.pptx

Workshop híbrido: Stream Processing con Flink

Apache flink

Mais de Knoldus Inc.

Robusta -Tool Presentation (DevOps).pptxKnoldus Inc.

Optimizing Kubernetes using GOLDILOCKS.pptxKnoldus Inc.

Azure Function App Exception Handling.pptxKnoldus Inc.

CQRS Design Pattern Presentation (Java).pptxKnoldus Inc.

ETL Observability: Azure to Snowflake PresentationKnoldus Inc.

Scripting with K6 - Beyond the Basics PresentationKnoldus Inc.

Getting started with dotnet core Web APIsKnoldus Inc.

Introduction To Rust part II PresentationKnoldus Inc.

Data governance with Unity Catalog PresentationKnoldus Inc.

Configuring Workflows & Validators in JIRAKnoldus Inc.

Advanced Python (with dependency injection and hydra configuration packages)Knoldus Inc.

Azure Databricks (For Data Analytics).pptxKnoldus Inc.

The Power of Dependency Injection with Dagger 2 and KotlinKnoldus Inc.

Data Engineering with Databricks PresentationKnoldus Inc.

Databricks for MLOps Presentation (AI/ML)Knoldus Inc.

NoOps - (Automate Ops) Presentation.pptxKnoldus Inc.

Mastering Distributed Performance TestingKnoldus Inc.

MLops on Vertex AI Presentation (AI/ML).pptxKnoldus Inc.

Introduction to Ansible Tower PresentationKnoldus Inc.

CQRS with dot net services presentation.Knoldus Inc.

Mais de Knoldus Inc. (20)

Robusta -Tool Presentation (DevOps).pptx

Optimizing Kubernetes using GOLDILOCKS.pptx

Azure Function App Exception Handling.pptx

CQRS Design Pattern Presentation (Java).pptx

ETL Observability: Azure to Snowflake Presentation

Scripting with K6 - Beyond the Basics Presentation

Getting started with dotnet core Web APIs

Introduction To Rust part II Presentation

Data governance with Unity Catalog Presentation

Configuring Workflows & Validators in JIRA

Advanced Python (with dependency injection and hydra configuration packages)

Azure Databricks (For Data Analytics).pptx

The Power of Dependency Injection with Dagger 2 and Kotlin

Data Engineering with Databricks Presentation

Databricks for MLOps Presentation (AI/ML)

NoOps - (Automate Ops) Presentation.pptx

Mastering Distributed Performance Testing

MLops on Vertex AI Presentation (AI/ML).pptx

Introduction to Ansible Tower Presentation

CQRS with dot net services presentation.

Último

Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein

A Framework for Development in the AI AgeCprime

Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen

TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey

2024 April Patch TuesdayIvanti

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3

Connecting the Dots for Information Discovery.pdfNeo4j

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3

Take control of your SAP testing with UiPath Test SuiteDianaGray10

Manual 508 Accessibility Compliance AuditSkynet Technologies

Time Series Foundation Models - current state and future directionsNathaniel Shimoni

How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe

UiPath Community: Communication Mining from Zero to HeroUiPathCommunity

Scale your database traffic with Read & Write split using MySQL RouterMydbops

[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra

The State of Passkeys with FIDO Alliance.pptxLoriGlavin3

Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3

Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll

Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda

Introduction To Flink

1. Presented By: Kundan Kumar Software Consultant An introduction to Apache Flink: 4G of Big Data

2. Lack of etiquette and manners is a huge turn oﬀ. KnolX Etiquettes Punctuality Respect Knolx session timings, you are requested not to join sessions after a 5 minutes threshold post the session start time. Feedback Make sure to submit a constructive feedback for all sessions as it is very helpful for the presenter. Mute Be on mute until you have questions or concerns. Avoid Disturbance Avoid unwanted chit chat during the session.

3. Agenda 01 Big Data evolution 02 Introduction to Flink 03 Features of Flink Architecture of Flink Anatomy of a Flink program Demo 04 05 06

4. Big Data Evolution Problems with Big Data: ● Storing huge and exponentially growing datasets. ● Processing of huge data datasets having complex structure. ● 3v’s of Big Data - Volume, Variety, Velocity

5. Continue.. ● At early 2000, Big Data era started with multiple frameworks focusing on specifying Big Data problem.

6. Continue.. ● A unified platform that alone can handle various Big Data problem: ➢ Batch processing ➢ Stream processing ➢ Graph processing ➢ Iterative processing ● A unified platform must have following characteristics to solve Big Data Problem: ➢ Distributed/ parallel computation ➢ Fault tolerance ➢ Ease of use (developer friendly API’s) ➢ Powerful predefined operators/functions(Like Join, filter) ➢ Fast

7. Apache Spark (3G Big Data Framework) ● Spark is a lightning-fast cluster computing engine that is 100 times faster than Hadoop in running applications in memory ● Apache Spark is best known for its in-memory computing capabilities that deliver high-speed processing. ➢ Problem ● Process data streams in micro batches and not in real time. ● High throughput but medium latency in some use cases.

8. Introduction to Flink ● Apache Flink is a Big Data framework and distributed processing engine for stateful computations over unbounded and bounded data streams. ● Flink is based on the streaming first principle which means it is real streaming processing engine Flink considers batch processing as a special case of streaming ● Flink has been designed to run in all common cluster environments, perform computations at in-memory speed and at any scale.

9. Source Transformations Sink

10. ➢ A Flink application may consume real-time data from streaming sources such as message queues or distributed logs, like Apache Kafka or Kinesis. ➢ Flink can also consume bounded, historic data from a variety of data sources. ➢ The streams of results being produced by a Flink application can be sent to a wide variety of systems that can be connected as sinks

11. ➢ Programs in Flink are inherently parallel and distributed. ➢ During execution, a stream has one or more stream partitions, and each operator has one or more operator subtasks.

12. ➢ Flink facilitate stateful operations. ➢ Current handling event can depend on the accumulated effect of all the events that came before it. ➢ The set of parallel instances of a stateful operator is effectively a sharded key-value store. Each parallel instance is responsible for handling events for a specific group of keys, and the state for those keys is kept locally.

13. Flink Architecture ➢ Flink 1.X's architecture consists of various components such as deploy, core processing, and APIs. ➢ Flink has a layered architecture and each component is a part of a specific layer. ➢ Each layer is built on top of the others for clear abstraction.

14. Flinks Distributed Execution ➢ Flink is based on master slave architecture. ➢ Various processes take part in the Flink’s program execution, namely Job Manager, Task Manager, and Job Client.

15. Flink Task Manager

16. Flink Features ➢ High performance ➢ Exactly-once stateful computation ➢ Fault tolerance ➢ Memory management ➢ Optimizer ➢ Unified platform for stream and batch ➢ Rich Libraries

17. Basic Anatomy of a Flink Program

18. DEMO

19. Q/A

20. References 1. 2. 3.

21. Thank You !