Apache FLINK.pptx

•Transferir como PPTX, PDF•

0 gostou•29 visualizações

This document provides an introduction to Apache Flink, a framework for distributed stream and batch data processing. It discusses the differences between batch and stream processing, with batch processing operating on static data periodically and stream processing operating immediately on event streams. The document then describes Flink's programming model including data sources, transformations, and sinks. It explains Flink's time classification of event time, ingestion time, and processing time. It also covers windows, watermarks, and compares Flink to other frameworks like Spark and Hadoop. Key features of Flink highlighted are its streaming capabilities, high speed, fault tolerance, and flexible windowing.

Tecnologia

Live Stock Feed
(Stream processing example)

Differences between Batch and Real-Time Processing
Batch Processing Real-Time Processing
Data Static Files Event Streams
Speed
Processed Periodically in minute,
hour, day etc.
Processed immediately
nanoseconds
Storage Past data on disk storage In Memory Storage
Example Bill Generation ATM Transaction Alert

FLINK program
Data source
Source is responsible for reading data from data
sources such as HDFS, KAFKA …
Transformation
Responsible for data transformation operations
Reduce(), sum(), max(), min() …
Data Sink
Responsible for final data outputs ()

FLINK time & window
EVENT TIME CLASSIFICATION TYPES
Event Time:
Time when an
event occurs
Ingestion time:
Time when an
event arrives at the
stream processing
system
Processing Time:
Time when an
event is processed
by the stream.

FLINK time & window
DEFINITION
Window is a
method for splitting
infinite data sets
into finites blocks
for processing.
Windows split the
stream into buckets
of infinite size,
which we can apply
computation.
TYPES

Time Windows based on Processing
Time
TUMBLING WINDOWS SLIDING WINDOWS

FLINK Watermark
OUT-OF-ORDER PROBLEM WATERMARK SOLUTION

Flink vs Spark vs Hadoop
Apache Hadoop Apache Spark Apache Flink
Data Processing Engine Batch Batch Stream
Processing Speed
Slower than Spark and
Flink
100x Faster than
Hadoop
Faster than spark
Throughput Medium High High
Optimization Manual Manual Automatic
Streaming Support NA Spark Streaming Flink Streaming
Graph Support NA GraphX Gelly
Machine Learning
Support
NA SparkML FlinkML
SQL Support Hive, Impala SparkSQL Table API and SQL
Data Transfer Batch Batch Pipelined and Batch

Features of Apache Flink
1) Has a streaming processor, which can run both batch and stream programs.
2) Can process data at lightning-fast speed.
3) APIs available in Java, Scala and Python.
4) Processes data in low latency (nanoseconds) and high throughput.
5) Its fault tolerant. If a node, application or a hardware fails, it does not affect the
cluster.
6) In-memory management can be customized for better computation.
7) Windowing is very flexible in Apache Flink.

Mais conteúdo relacionado

Semelhante a Apache FLINK.pptx

Introduction to streaming and messaging flume,kafka,SQS,kinesis

Omid Vahdaty

Event Stream Processing with Kafka and Samza

Zach Cox

Stream processing has been traditionally associated with realtime analytics. Modern stream processors, like Apache Flink, however, go far beyond that and give us a new approach to build applications and services as a whole. This talk shows how to build applications on *data streams*, *state*, and *snaphots* (point-in-time views of application state) using Apache Flink. Rather than separating computation (application) and state (database), Flink manages the application logic and state as a tight pair and uses snapshots for consistent view onto the application and its state. With features like Flink's queryable state, the stream processor and database effectively become one. This application pattern has many interesting properties: Aside from having fewer moving parts, it supports very high event rates because of its tight integration between computation and state, and its simple concurrency and recovery model. At the same time, it exposes a powerful consistency model, allows for seamless forking/updating/rollback of online applications, generalizes across historic and real-time data, and easily incorporates event time semantics and handling of late data. Finally, it allows applications to be defined in an easy way via streaming SQL.

Building Applications with Streams and Snapshots

J On The Beach

DataEngConf SF16 - Collecting and Moving Data at Scale

Hakka Labs

Streaming Analytics

Neera Agarwal

GOTO Night Amsterdam - Stream processing with Apache Flink

Robert Metzger

A Deep Dive into Structured Streaming: Apache Spark Meetup at Bloomberg 2016

Databricks

A: Data! But do you know where this data is duplicated, by whom and exactly how it’s scattered across laptops, desktops, file servers and IBM Domino databases? Let us show you how to analyze local drives, network drives and server based apps to get a grasp of what data is out there and what it means to your business. Learn how to collect, aggregate and analyze file sizes and types, as well as identify knowledge sharing patterns. This session will empower you to work towards reducing your data storage costs and increasing collaboration efficiency!

BP301: Q: What’s Your Second Most Valuable Asset and Nearly Doubles Every Year?

panagenda

Back to the program Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics Thursday 17th from 18:00 to 18:40 Theatre 19 - Keynote In this talk I’ll give a very short introduction to stream processing in general and then dive into event-time based stream processing. I will outline how this is important for IoT applications and also why it is such a challenging topic. Afterwards we’ll look at some real-world IoT use cases that are enabled by the support for robust event-time based stream processing provided by Apache Flink™. We will especially focus on easy of use and on correctness of results in the face of errors. In the first half of the talk we’ll cover the basics of stream processing. We will look at the differences between event-time based and processing-time and at stateful stream processing. While on this, we’ll also highlight how the combination of these features is essential for doing robust stream processing in an IoT setting. In the second part, we will look at how Flink solves some of the challenges that arise in event-time based processing and how that enables novel applications in the IoT space. We will do the latter by looking at a collection of real-world IoT use cases. Some of the topics covered will be: - Apache Flink - Stateful Stream Processing - Event Time vs. Processing Time Windowing - Processing of out-of-order events - IoT use cases

Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Ea...

Ververica

Flink and Kafka are popular components to build an open source stream processing infrastructure. We present how Flink integrates with Kafka to provide a platform with a unique feature set that matches the challenging requirements of advanced stream processing applications. In particular, we will dive into the following points: Flink’s support for event-time processing, how it handles out-of-order streams, and how it can perform analytics on historical and real-time streams served from Kafka’s persistent log using the same code. We present Flink’s windowing mechanism that supports time-, count- and session- based windows, and intermixing event and processing time semantics in one program. How Flink’s checkpointing mechanism integrates with Kafka for fault-tolerance, for consistent stateful applications with exactly-once semantics. We will discuss “”Savepoints””, which allows users to save the state of the streaming program at any point in time. Together with a durable event log like Kafka, savepoints allow users to pause/resume streaming programs, go back to prior states, or switch to different versions of the program, while preserving exactly-once semantics. We explain the techniques behind the combination of low-latency and high throughput streaming, and how latency/throughput trade-off can configured. We will give an outlook on current developments for streaming analytics, such as streaming SQL and complex event processing.

Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen

confluent

Data Streaming in Kafka

SilviuMarcu1

K. Tzoumas & S. Ewen – Flink Forward Keynote

Flink Forward

Apache Flink Stream Processing

Suneel Marthi

On Friday October 7th 2016 at Crunch Conference in Budapest I gave a talk entitled "Asynchronous micro-services and the unified log". The unified log enabled by Apache Kafka and Amazon Kinesis has been mostly understood as a better data processing architecture, replacing traditional data warehousing techniques. But the unified log also enables a new way of building transactional software, by enabling asynchronous micro-services. In this talk, I showed how event-driven micro-services designed around Kafka or Kinesis resolve many of the issues associated with traditional monolithic and synchronous micro-service based architectures.

Asynchronous micro-services and the unified log

Alexander Dean

Real-time Stream Processing with Apache Flink

DataWorks Summit

Join us for a for a Amazon Kinesis tutorial webinar. In this session we will provide a reference architecture and instructions for building a system that performs real-time sliding-windows analysis over streaming clickstream data. We will use Amazon Kinesis for managed ingestion of streaming data at scale with the ability to replay past data, and run sliding-window computation using Apache Storm. We’ll demonstrate in the webinar on how to build the system and deploy on AWS and walkthrough all the steps from ingestion, processing, and storing to visualizing of the data in real-time.

AWS Webcast - Amazon Kinesis and Apache Storm

Amazon Web Services

Apache Mesos allows operators to run distributed applications across an entire datacenter and is attracting ever increasing interest. As much as distributed applications see increased use enabled by Mesos, Mesos also sees increasing use due to a growing ecosystem of well integrated applications. One of the latest additions to the Mesos family is Apache Flink. Flink is one of the most popular open source systems for real-time high scale data processing and allows users to deal with low-latency streaming analytical workloads on Mesos. In this talk we explain the challenges solved while integrating Flink with Mesos, including how Flink’s distributed architecture can be modeled as a Mesos framework, and how Flink was integrated with Fenzo. Next, we describe how Flink was packaged to easily run on DC/OS.

Flink Forward Berlin 2017: Jörg Schad, Till Rohrmann - Apache Flink meets Apa...

Flink Forward

Apache Flink® Meets Apache Mesos® and DC/OS

Till Rohrmann

Introduction to Actionscript3

Yoss Cohen

Msdn Workflow Services And Windows Server App Fabric

Juan Pablo

Semelhante a Apache FLINK.pptx (20)

Introduction to streaming and messaging flume,kafka,SQS,kinesis

Event Stream Processing with Kafka and Samza

Building Applications with Streams and Snapshots

DataEngConf SF16 - Collecting and Moving Data at Scale

Streaming Analytics

GOTO Night Amsterdam - Stream processing with Apache Flink

A Deep Dive into Structured Streaming: Apache Spark Meetup at Bloomberg 2016

BP301: Q: What’s Your Second Most Valuable Asset and Nearly Doubles Every Year?

Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Ea...

Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen

Data Streaming in Kafka

K. Tzoumas & S. Ewen – Flink Forward Keynote

Apache Flink Stream Processing

Asynchronous micro-services and the unified log

Real-time Stream Processing with Apache Flink

AWS Webcast - Amazon Kinesis and Apache Storm

Flink Forward Berlin 2017: Jörg Schad, Till Rohrmann - Apache Flink meets Apa...

Apache Flink® Meets Apache Mesos® and DC/OS

Introduction to Actionscript3

Msdn Workflow Services And Windows Server App Fabric

Último

Building Digital Trust in a Digital Economy Veronica Tan, Director - Cyber Security Agency of Singapore Apidays Singapore 2024: Connecting Customers, Business and Technology (April 17 & 18, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...

apidays

Imagine a world where information flows as swiftly as thought itself, making decision-making as fluid as the data driving it. Every moment is critical, and the right tools can significantly boost your organization’s performance. The power of real-time data automation through FME can turn this vision into reality. Aimed at professionals eager to leverage real-time data for enhanced decision-making and efficiency, this webinar will cover the essentials of real-time data and its significance. We’ll explore: FME’s role in real-time event processing, from data intake and analysis to transformation and reporting An overview of leveraging streams vs. automations FME’s impact across various industries highlighted by real-life case studies Live demonstrations on setting up FME workflows for real-time data Practical advice on getting started, best practices, and tips for effective implementation Join us to enhance your skills in real-time data automation with FME, and take your operational capabilities to the next level.

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

Safe Software

As privacy and data protection regulations evolve rapidly, organizations operating in multiple jurisdictions face mounting challenges to ensure compliance and safeguard customer data. With state-specific privacy laws coming up in multiple states this year, it is essential to understand what their unique data protection regulations will require clearly. How will data privacy evolve in the US in 2024? How to stay compliant? Our panellists will guide you through the intricacies of these states' specific data privacy laws, clarifying complex legal frameworks and compliance requirements. This webinar will review: - The essential aspects of each state's privacy landscape and the latest updates - Common compliance challenges faced by organizations operating in multiple states and best practices to achieve regulatory adherence - Valuable insights into potential changes to existing regulations and prepare your organization for the evolving landscape

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments

TrustArc

Data Cloud, More than a CDP by Matt Robison

Anna Loughnan Colquhoun

presentation ICT roal in 21st century education

jfdjdjcjdnsjd

The presentation explores the development and application of artificial intelligence (AI) from its inception to its current status in the modern world. The term "artificial intelligence" was first coined by John McCarthy in 1956 to describe efforts to develop computer programs capable of performing tasks that typically require human intelligence. This concept was first introduced at a conference held at Dartmouth College, where programs demonstrated capabilities such as playing chess, proving theorems, and interpreting texts. In the early stages, Alan Turing contributed to the field by defining intelligence as the ability of a being to respond to certain questions intelligently, proposing what is now known as the Turing Test to evaluate the presence of intelligent behavior in machines. As the decades progressed, AI evolved significantly. The 1980s focused on machine learning, teaching computers to learn from data, leading to the development of models that could improve their performance based on their experiences. The 1990s and 2000s saw further advances in algorithms and computational power, which allowed for more sophisticated data analysis techniques, including data mining. By the 2010s, the proliferation of big data and the refinement of deep learning techniques enabled AI to become mainstream. Notable milestones included the success of Google's AlphaGo and advancements in autonomous vehicles by companies like Tesla and Waymo. A major theme of the presentation is the application of generative AI, which has been used for tasks such as natural language text generation, translation, and question answering. Generative AI uses large datasets to train models that can then produce new, coherent pieces of text or other media. The presentation also discusses the ethical implications and the need for regulation in AI, highlighting issues such as privacy, bias, and the potential for misuse. These concerns have prompted calls for comprehensive regulations to ensure the safe and equitable use of AI technologies. Artificial intelligence has also played a significant role in healthcare, particularly highlighted during the COVID-19 pandemic, where it was used in drug discovery, vaccine development, and analyzing the spread of the virus. The capabilities of AI in healthcare are vast, ranging from medical diagnostics to personalized medicine, demonstrating the technology's potential to revolutionize fields beyond just technical or consumer applications. In conclusion, AI continues to be a rapidly evolving field with significant implications for various aspects of society. The development from theoretical concepts to real-world applications illustrates both the potential benefits and the challenges that come with integrating advanced technologies into everyday life. The ongoing discussion about AI ethics and regulation underscores the importance of managing these technologies responsibly to maximize their their benefits while minimizing potential harms.

Artificial Intelligence: Facts and Myths

Joaquim Jorge

How to Troubleshoot Apps for the Modern Connected Worker

ThousandEyes

Effective data discovery is crucial for maintaining compliance and mitigating risks in today's rapidly evolving privacy landscape. However, traditional manual approaches often struggle to keep pace with the growing volume and complexity of data. Join us for an insightful webinar where industry leaders from TrustArc and Privya will share their expertise on leveraging AI-powered solutions to revolutionize data discovery. You'll learn how to: - Effortlessly maintain a comprehensive, up-to-date data inventory - Harness code scanning insights to gain complete visibility into data flows leveraging the advantages of code scanning over DB scanning - Simplify compliance by leveraging Privya's integration with TrustArc - Implement proven strategies to mitigate third-party risks Our panel of experts will discuss real-world case studies and share practical strategies for overcoming common data discovery challenges. They'll also explore the latest trends and innovations in AI-driven data management, and how these technologies can help organizations stay ahead of the curve in an ever-changing privacy landscape.

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery

TrustArc

Increase engagement and revenue with Muvi Live Paywall! In this presentation, we will explore the five key benefits of using Muvi Live Paywall to monetize your live streams. You'll learn how Muvi Live Paywall can help you: Monetize your live content easily: Set up pay-per-view access to your live streams and start generating revenue from your content. Increase audience engagement: Provide exclusive, premium content behind the paywall to keep your viewers engaged. Gain valuable viewer insights: Track viewer data and analytics to better understand your audience and tailor your content accordingly. Reduce content piracy: Muvi Live Paywall's security features help protect your content from unauthorized distribution. Streamline your workflow: The all-in-one platform simplifies the process of managing and monetizing your live streams. With Muvi Live Paywall, you can take control of your live stream monetization and create a sustainable business model for your content. Learn more about Muvi Live Paywall and start generating revenue from your live streams today!

Top 5 Benefits OF Using Muvi Live Paywall For Live Streams

Roshan Dwivedi

Manulife - Insurer Innovation Award 2024

The Digital Insurer

🐬 The future of MySQL is Postgres 🐘

RTylerCroy

AWS Community Day CPH - Three problems of Terraform

Andrey Devyatkin

Partners Life - Insurer Innovation Award 2024

The Digital Insurer

A Domino Admins Adventures (Engage 2024)

Gabriella Davis

Join our latest Connector Corner webinar to discover how UiPath Integration Service revolutionizes API-centric automation in a 'Quote to Cash' process—and how that automation empowers businesses to accelerate revenue generation. A comprehensive demo will explore connecting systems, GenAI, and people, through powerful pre-built connectors designed to speed process cycle times. Speakers: James Dickson, Senior Software Engineer Charlie Greenberg, Host, Product Marketing Manager

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...

DianaGray10

Real Time Object Detection Using Open CV

Khem

The value of a flexible API Management solution for Open Banking Steve Melan, Manager for IT Innovation and Architecture - State's and Saving's Bank of Luxembourg Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - The value of a flexible API Management solution for O...

apidays

MySQL Webinar, presented on the 25th of April, 2024. Summary: MySQL solutions enable the deployment of diverse Database Architectures tailored to specific needs, including High Availability, Disaster Recovery, and Read Scale-Out. With MySQL Shell's AdminAPI, administrators can seamlessly set up, manage, and monitor these solutions, ensuring efficiency and ease of use in their administration. MySQL Router, on the other hand, provides transparent routing from the application traffic to the backend servers in the architectures, requiring minimal configuration. Completely built in-house and supported by Oracle, these solutions have been adopted by enterprises of all sizes for their business-critical applications. In this presentation, we'll delve into various database architecture solutions to help you choose the right one based on your business requirements. Focusing on technical details and the latest features to maximize the potential of these solutions.

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Miguel Araújo

Three things you will take away from the session: • How to run an effective tenant-to-tenant migration • Best practices for before, during, and after migration • Tips for using migration as a springboard to prepare for Copilot in Microsoft 365 Main ideas: Migration Overview: The presentation covers the current reality of cross-tenant migrations, the triggers, phases, best practices, and benefits of a successful tenant migration Considerations: When considering a migration, it is important to consider the migration scope, performance, customization, flexibility, user-friendly interface, automation, monitoring, support, training, scalability, data integrity, data security, cost, and licensing structure Next Wave: The next wave of change includes the launch of Copilot, which requires businesses to be prepared for upcoming changes related to Copilot and the cloud, and to consolidate data and tighten governance ShareGate: ShareGate can help with pre-migration analysis, configurable migration tool, and automated, end-user driven collaborative governance

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff

sammart93

Strategies for Landing an Oracle DBA Job as a Fresher

Remote DBA Services

Apache FLINK.pptx

2. Agenda

7. Live Stock Feed (Stream processing example)

8. Differences between Batch and Real-Time Processing Batch Processing Real-Time Processing Data Static Files Event Streams Speed Processed Periodically in minute, hour, day etc. Processed immediately nanoseconds Storage Past data on disk storage In Memory Storage Example Bill Generation ATM Transaction Alert

9. Deeper into FLINK

10. Eco-system Apache FLINK

11. FLINK program Data source Source is responsible for reading data from data sources such as HDFS, KAFKA … Transformation Responsible for data transformation operations Reduce(), sum(), max(), min() … Data Sink Responsible for final data outputs ()

12. Architecture

13. Job Running Process

14. FLINK time & window EVENT TIME CLASSIFICATION TYPES Event Time: Time when an event occurs Ingestion time: Time when an event arrives at the stream processing system Processing Time: Time when an event is processed by the stream.

15. Different Between Three Time

16. FLINK time & window DEFINITION Window is a method for splitting infinite data sets into finites blocks for processing. Windows split the stream into buckets of infinite size, which we can apply computation. TYPES

17. Time Windows based on Processing Time TUMBLING WINDOWS SLIDING WINDOWS

18. FLINK Watermark OUT-OF-ORDER PROBLEM WATERMARK SOLUTION

19. Tips and useful resources

20. Flink vs Spark vs Hadoop Apache Hadoop Apache Spark Apache Flink Data Processing Engine Batch Batch Stream Processing Speed Slower than Spark and Flink 100x Faster than Hadoop Faster than spark Throughput Medium High High Optimization Manual Manual Automatic Streaming Support NA Spark Streaming Flink Streaming Graph Support NA GraphX Gelly Machine Learning Support NA SparkML FlinkML SQL Support Hive, Impala SparkSQL Table API and SQL Data Transfer Batch Batch Pipelined and Batch

21. Features of Apache Flink 1) Has a streaming processor, which can run both batch and stream programs. 2) Can process data at lightning-fast speed. 3) APIs available in Java, Scala and Python. 4) Processes data in low latency (nanoseconds) and high throughput. 5) Its fault tolerant. If a node, application or a hardware fails, it does not affect the cluster. 6) In-memory management can be customized for better computation. 7) Windowing is very flexible in Apache Flink.

22. Thank You

Apache FLINK.pptx

Recomendados

Recomendados

Mais conteúdo relacionado

Semelhante a Apache FLINK.pptx

Semelhante a Apache FLINK.pptx (20)

Último

Último (20)

Apache FLINK.pptx