Fault Tolerance and Job Recovery in Apache Flink @ FlinkForward 2015

•

9 gostaram•3,472 visualizações

The talk explains how Apache Flink checkpoints stateful jobs using the asynchronous barrier snapshotting algorithm to give exactly once semantics in streaming. Furthermore, Flink's approach to master high availability (HA) is described which solves the problem of the JobManager being the single point of failure. Job checkpointing in combination with HA is the basis for Flink's fault tolerance mechanism to recover from occurring failures.

Tecnologia

Fault Tolerance and Job
Recovery in Apache Flink™
Till Rohrmann
trohrmann@apache.org
@stsffap

Better be safe than sorry
§  Failures will happen
§  EMC estimated $1.7 billion costs due to
data loss and system downtime
§  Recovery will save you time and costs
§  Switch between algorithms
§  Live upgrade of your system
3

Fault tolerance guarantees
§  At most once
•  No guarantees at all
§  At least once
•  For many applications
sufﬁcient
§  Exactly once
§  Flink provides all guarantees
5

Checkpoints
§  Consistent snapshots of distributed data
stream and operator state
6

Barriers
§  Markers for checkpoints
§  Injected in the data ﬂow
7

8
§  Alignment for multi-input operators

$Operator State §  Stateless operators §  System state §  User deﬁned state 9 ds.filter(_ != 0) ds.keyBy(0).window(TumblingTimeWindows.of(5, TimeUnit.SECONDS)) public class CounterSum implements RichReduceFunction<Long> { private OperatorState<Long> counter; @Override public Long reduce(Long v1, Long v2) throws Exception { counter.update(counter.value() + 1); return v1 + v2; } @Override public void open(Configuration config) { counter = getRuntimeContext().getOperatorState(“counter”, 0L, false); } }$

Advantages
§  Separation of app logic from recovery
•  Checkpointing interval is just a conﬁg
parameter
§  High throughput
•  Controllable checkpointing overhead
§  Low impact on latency
14

Without high availability
17
JobManager
TaskManager

With high availability
18
JobManager
TaskManager
Stand-by
JobManager
Apache Zookeeper™
KEEP GOING

Persisting jobs
19
JobManager
Client
TaskManagers
Apache Zookeeper™
Job
1.  Submit job

Persisting jobs
20
JobManager
Client
TaskManagers
Apache Zookeeper™
1.  Submit job
2.  Persist execuAon graph

Persisting jobs
21
JobManager
Client
TaskManagers
Apache Zookeeper™
1.  Submit job
2.  Persist execuAon graph
3.  Write handle to ZooKeeper

Persisting jobs
22
JobManager
Client
TaskManagers
Apache Zookeeper™
1.  Submit job
2.  Persist execuAon graph
3.  Write handle to ZooKeeper
4.  Deploy tasks

Handling checkpoints
23
JobManager
Client
TaskManagers
Apache Zookeeper™
1.  Take snapshots

Handling checkpoints
24
JobManager
Client
TaskManagers
Apache Zookeeper™
1.  Take snapshots
2.  Persist snapshots
3.  Send handles to JM

Handling checkpoints
25
JobManager
Client
TaskManagers
Apache Zookeeper™
1.  Take snapshots
2.  Persist snapshots
3.  Send handles to JM
4.  Create global checkpoint

Handling checkpoints
26
JobManager
Client
TaskManagers
Apache Zookeeper™
1.  Take snapshots
2.  Persist snapshots
3.  Send handles to JM
4.  Create global checkpoint
5.  Persist global checkpoint

Handling checkpoints
27
JobManager
Client
TaskManagers
Apache Zookeeper™
1.  Take snapshots
2.  Persist snapshots
3.  Send handles to JM
4.  Create global checkpoint
5.  Persist global checkpoint
6.  Write handle to ZooKeeper

TL;DL
§  Job recovery mechanism with low latency
and high throughput
§  Exactly one processing semantics
§  No single point of failure
è Flink will always keep processing
your data
31

Mais conteúdo relacionado

Mais procurados

This talk shares experiences from deploying and tuning Flink steam processing applications for very large scale. We share lessons learned from users, contributors, and our own experiments about running demanding streaming jobs at scale. The talk will explain what aspects currently render a job as particularly demanding, show how to configure and tune a large scale Flink job, and outline what the Flink community is working on to make the out-of-the-box for experience as smooth as possible. We will, for example, dive into - analyzing and tuning checkpointing - selecting and configuring state backends - understanding common bottlenecks - understanding and configuring network parameters

Stephan Ewen - Experiences running Flink at Very Large Scale

Ververica

Unified Stream and Batch Processing with Apache Flink

DataWorks Summit/Hadoop Summit

Apache Flink provides powerful stream processing capabilities which can allow organizations to move directly from batch to real time analytics, skipping the lambda architecture entirely. However, getting to production is not always as simple as rewriting your job in a new API, but requires rethinking your application design with a stream first mindset. This talk will cover MediaMath’s journey in rebuilding its reporting infrastructure using Apache Flink. We will discuss high level architectural designs when building an extensible reporting platform as well as deep dive into specific technical hurdles. Topics will include managing a Flink cluster on EC2 spot instances, reconciling Flink’s consistency model with S3’s, handling massive data skew as well as tools and techniques for building performant, fault tolerant streaming applications.

Flink Forward SF 2017: Cliff Resnick & Seth Wiesman - From Zero to Streami...

Flink Forward

Pulsar connector on flink 1.14

宇帆盛

Matthias J. Sax – A Tale of Squirrels and Storms

Flink Forward

Witnessing the rise of stream processing from the driving seat, we see Apache Flink® and associated technologies used for a wide variety of business applications, from routing data through systems, serving as a backbone for real-time analytics on live data using SQL, detecting credit card fraud, to implementing complete end-to-end social networks. Such applications enable modern data-driven businesses where decisions and actions happen in real-time, and transform traditional businesses to become more data-driven. Observing the variety of these applications implemented using Flink, it becomes apparent that the traditional dividing line between analytics and operational applications is becoming more and more blurry. Historically, operational applications were built using transactional databases, and analytics were done offline. In contrast, Flink’s, state, checkpoints, and time management are the core building blocks for both operational applications with strong data consistency needs, and for real-time analytics with correctness guarantees. With these shared building blocks, developers start building what is arguably a new class of data-driven applications: applications that are operational in that they serve live systems and at the same time analytical in that they perform complex data analysis. Following application architectures like CQRS and using new features like Flink’s queryable state, streaming analytics and online applications move even closer to each other. In this talk, guided by real-world use cases, we present how the unique core concepts behind Flink simplify the development, deployment, and management of data-driven applications, and we conclude with a vision for the future for Flink and stream processing.

Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and ...

Flink Forward

Apache Beam is Flink’s sibling in the Apache family of streaming processing frameworks. The Beam and Flink teams work closely together on advancing what is possible in streaming processing, including Streaming SQL extensions and code interoperability on both platforms. Beam was originally developed at Google as the amalgamation of its internal batch and streaming frameworks to power the exabyte-scale data processing for Gmail, YouTube and Ads. It now powers a fully-managed, serverless service Google Cloud Dataflow, as well as is available to run in other Public Clouds and on-premises when deployed in portability mode on Apache Flink, Spark, Samza and other runners. Users regularly run distributed data processing jobs on Beam spanning tens of thousands of CPU cores and processing millions of events per second. In this session, Sergei Sokolenko, Cloud Dataflow product manager, and Reuven Lax, the founding member of the Dataflow and Beam team, will share Google’s learnings from building and operating a global streaming processing infrastructure shared by thousands of customers, including: safe deployment to dozens of geographic locations, resource autoscaling to minimize processing costs, separating compute and state storage for better scaling behavior, dynamic work rebalancing of work items away from overutilized worker nodes, offering a throughput-optimized batch processing capability with the same API as streaming, grouping and joining of 100s of Terabytes in a hybrid in-memory/on-desk file system, integrating with the Google Cloud security ecosystem, and other lessons. Customers benefit from these advances through faster execution of jobs, resource savings, and a fully managed data processing environment that runs in the Cloud and removes the need to manage infrastructure.

Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...

Flink Forward

Big Data Warsaw

Maximilian Michels

Apache Beam (unified Batch and strEAM processing!) is a new Apache incubator project. Originally based on years of experience developing Big Data infrastructure within Google (such as MapReduce, FlumeJava, and MillWheel), it has now been donated to the OSS community at large. Come learn about the fundamentals of out-of-order stream processing, and how Beam’s powerful tools for reasoning about time greatly simplify this complex task. Beam provides a model that allows developers to focus on the four important questions that must be answered by any stream processing pipeline: What results are being calculated? Where in event time are they calculated? When in processing time are they materialized? How do refinements of results relate? Furthermore, by cleanly separating these questions from runtime characteristics, Beam programs become portable across multiple runtime environments, both proprietary (e.g., Google Cloud Dataflow) and open-source (e.g., Flink, Spark, et al).

Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry

confluent

Many stream processing applications can benefit from or need to rely on the prediction made with machine learning (ML) methods. In this presentation, new features of Apache Samoa are presented with a real data processing scenario. These features make Apache SAMOA fully accessible for Apache Flink users: (1) the data stream processed within Apache Flink is forwarded to Apache Samoa stream mining engine to perform predictions with stream-oriented ML models, (2) ML models evolve after every labelled instance and, at the same time, new predictions are sent back to Apache Flink. In both cases, Apache Kafka is used for data exchange. Hence, Apache Samoa is used as stream mining engine, provided with input data from, and sending predictions to Apache Flink. During the presentation, real life aspects are illustrated with code examples, such as input and prediction stream integration and monitoring latency of data processing and stream mining.

Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...

Flink Forward

Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1VhSzmy. Robert Metzger provides an overview of the Apache Flink internals and its streaming-first philosophy, as well as the programming APIs. Filmed at qconlondon.com. Robert Metzger is a PMC member at the Apache Flink project and a cofounder and software engineer at data Artisans. He is the author of many Flink components including the Kafka and YARN connectors.

Stream Processing with Apache Flink

C4Media

As stream processing engines become more and more popular and are used in different environments, the demand to support different deployment scenarios increases. Depending on the user's infrastructure, a stream processor might be run on a bare metal cluster in standalone mode, deployed via Apache Yarn and Mesos, or run in a containerized environment. In order to fulfill the requirements of different deployment options and to provide enough flexibility for the future, the Flink community has recently started to redesign Flink's distributed architecture. This talk will explain the limitations of the old architecture and how they are solved with the new design. We will present the new building blocks of a Flink cluster and demonstrate, using the example of Flink's Mesos and Docker support, how they can be combined to run Flink nearly everywhere.

Flink Forward SF 2017: Till Rohrmann - Redesigning Apache Flink’s Distributed...

Flink Forward

The increasing number of available data sources in today's application stacks created a demand to continuously capture and process data from various sources to quickly turn high volume streams of raw data into actionable insights. Apache Flink addresses many of the challenges faced in this domain as it's specifically tailored to distributed computations over streams. While Flink provides all the necessary capabilities to process streaming data, provisioning and maintaining a Flink cluster still requires considerable effort and expertise. We will discuss how cloud services can remove most of the burden of running the clusters underlying your Flink jobs and explain how to build a real-time processing pipeline on top of AWS by integrating Flink with Amazon Kinesis and Amazon EMR. We will furthermore illustrate how to leverage the reliable, scalable, and elastic nature of the AWS cloud to effectively create and operate your real-time processing pipeline with little operational overhead.

Flink Forward Berlin 2017: Steffen Hausmann - Build a Real-time Stream Proces...

Flink Forward

Flink forward SF 2017: Elizabeth K. Joseph and Ravi Yadav - Flink meet DC/OS ...

Flink Forward

Flink Streaming @BudapestData

Gyula Fóra

A look at Flink 1.2

Stefan Richter

Apache Mesos allows operators to run distributed applications across an entire datacenter and is attracting ever increasing interest. As much as distributed applications see increased use enabled by Mesos, Mesos also sees increasing use due to a growing ecosystem of well integrated applications. One of the latest additions to the Mesos family is Apache Flink. Flink is one of the most popular open source systems for real-time high scale data processing and allows users to deal with low-latency streaming analytical workloads on Mesos. In this talk we explain the challenges solved while integrating Flink with Mesos, including how Flink’s distributed architecture can be modeled as a Mesos framework, and how Flink was integrated with Fenzo. Next, we describe how Flink was packaged to easily run on DC/OS.

Flink Forward Berlin 2017: Jörg Schad, Till Rohrmann - Apache Flink meets Apa...

Flink Forward

Last year we (TouK) introduced Flink in one of the biggest polish telcoms in the domain of real time marketing and fraud detection. One of the most significant problems in adoption was lack of programming skills at our client - the users were supposed to be analytics/business people. Therefore, we developed a custom platform - TouK Nussknacker - which allows users to design processes with GUI by drawing diagrams. Our project is going to be open-sourced soon - this will happen before Flink Forward. We believe it can make stream processing with Flink more accessible in many use cases, especially in companies that don't have their own development teams. During the talk I’m going to describe architecture of our platform, why we made certain design decisions and about our future plans. I’ll also describe our experiences - when being able to use GUI is great and when it’s better to develop jobs as normal code. If time permits I’ll also show a quick demo of our solution.

Flink Forward Berlin 2017: Maciek Próchniak - TouK Nussknacker - creating Fli...

Flink Forward

Flink's streaming API can be used to construct a scalable, fault tolerant framework for buffering high frequency time series data, with the goal being to output larger, immutable blocks of data. As the data is being buffered into larger blocks, Flink's queryable state feature can be used to service requests for data still in the "buffering" state. The high frequency time series data set in this example is electro cardiogram data (EKG) that is buffered from a sample rate in millisecond into multi minute blocks.

Flink Forward SF 2017: Joe Olson - Using Flink and Queryable State to Buffer ...

Flink Forward

Apache flink 1.0.0 overview

MapR Technologies

Mais procurados (20)

Stephan Ewen - Experiences running Flink at Very Large Scale

Unified Stream and Batch Processing with Apache Flink

Flink Forward SF 2017: Cliff Resnick & Seth Wiesman - From Zero to Streami...

Pulsar connector on flink 1.14

Matthias J. Sax – A Tale of Squirrels and Storms

Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and ...

Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...

Big Data Warsaw

Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry

Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...

Stream Processing with Apache Flink

Flink Forward SF 2017: Till Rohrmann - Redesigning Apache Flink’s Distributed...

Flink Forward Berlin 2017: Steffen Hausmann - Build a Real-time Stream Proces...

Flink forward SF 2017: Elizabeth K. Joseph and Ravi Yadav - Flink meet DC/OS ...

Flink Streaming @BudapestData

A look at Flink 1.2

Flink Forward Berlin 2017: Jörg Schad, Till Rohrmann - Apache Flink meets Apa...

Flink Forward Berlin 2017: Maciek Próchniak - TouK Nussknacker - creating Fli...

Flink Forward SF 2017: Joe Olson - Using Flink and Queryable State to Buffer ...

Apache flink 1.0.0 overview

Destaque

Modern stream processing engines not only have to process millions of events per second at sub-second latency but also have to cope with constantly changing workloads. Due to the dynamic nature of stream applications where the number of incoming events can strongly vary with time, systems cannot reliably predetermine the amount of required resources. In order to meet guaranteed SLAs as well as utilizing system resources as efficiently as possible, frameworks like Apache Flink have to adapt their resource consumption dynamically. In this talk, we will take a look under the hood and explain how Flink scales stateful application in and out. Starting with the concept of key groups and partionable state, we will cover ways to detect bottlenecks in streaming jobs and discuss efficient strategies how to scale out operators with minimal down-time.

Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...

Till Rohrmann

Click-Through Example for Flink’s KafkaConsumer Checkpointing

Robert Metzger

In recent years, the generated and collected data is increasing at an almost exponential rate. At the same time, the data’s value has been identified in terms of insights that can be provided. However, retrieving the value requires powerful analysis tools, since valuable insights are buried deep in large amounts of noise. Unfortunately, analytic capacities did not scale well with the growing data. Many existing tools run only on a single computer and are limited in terms of data size by its memory. A very promising solution to deal with large-scale data is scaling systems and exploiting parallelism. In this presentation, we propose Gilbert, a distributed sparse linear algebra system, to decrease the imminent lack of analytic capacities. Gilbert offers a MATLAB-like programming language for linear algebra programs, which are automatically executed in parallel. Transparent parallelization is achieved by compiling the linear algebra operations first into an intermediate representation. This language-independent form enables high-level algebraic optimizations. Different optimization strategies are evaluated and the best one is chosen by a cost-based optimizer. The optimized result is then transformed into a suitable format for parallel execution. Gilbert generates execution plans for Apache Spark and Apache Flink, two massively parallel dataflow systems. Distributed matrices are represented by square blocks to guarantee a well-balanced trade-off between data parallelism and data granularity. An exhaustive evaluation indicates that Gilbert is able to process varying amounts of data exceeding the memory of a single computer on clusters of different sizes. Two well known machine learning (ML) algorithms, namely PageRank and Gaussian non-negative matrix factorization (GNMF), are implemented with Gilbert. The performance of these algorithms is compared to optimized implementations based on Spark and Flink. Even though Gilbert is not as fast as the optimized algorithms, it simplifies the development process significantly due to its high-level programming abstraction.

Gilbert: Declarative Sparse Linear Algebra on Massively Parallel Dataflow Sys...

Till Rohrmann

Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015

Till Rohrmann

Interactive Data Analysis with Apache Flink @ Flink Meetup in Berlin

Till Rohrmann

This presentation introduces Apache Flink, a massively parallel data processing engine which currently undergoes the incubation process at the Apache Software Foundation. Flink's programming primitives are presented and it is shown how easily a distributed PageRank algorithm can be implemented with Flink. Intriguing features such as dedicated memory management, Hadoop compatibility, streaming and automatic optimisation make it an unique system in the world of Big Data processing.

Introduction to Apache Flink - Fast and reliable big data processing

Till Rohrmann

Streaming Analytics & CEP - Two sides of the same coin?

Till Rohrmann

Juggling with Bits and Bytes - How Apache Flink operates on binary data

Fabian Hueske

High availability and fault tolerance of openstack

Deepak Mane

Apache Flink Hands On

Robert Metzger

Machine Learning with Apache Flink at Stockholm Machine Learning Group

Till Rohrmann

http://flink-forward.org/kb_sessions/flink-security-enhancements/ Recent security enhancements to Flink make it easy to access secure data and to protect the associated credentials. In this talk we’ll describe and demonstrate the new features, including Kerberos-based access to HDFS and Kafka, transport security (TLS), and service-level authorization which protects your Flink cluster from unauthorized access.

Eron Wright - Flink Security Enhancements

Flink Forward

Learn best practices for building a real-time streaming data architecture on AWS with Spark Streaming, Amazon Kinesis, and Amazon Elastic MapReduce (EMR). Get a closer look at how to ingest streaming data scalably and durably from data producers like mobile devices, servers, and even web browsers, and design a stream processing application with minimal data duplication and exactly-once processing. Presented by: Guy Ernest, Principal Business Development Manager, Amazon Web Services Customer Guest: Harry Koch, Solutions Architecture, Philips

Real-Time Streaming Data on AWS

Amazon Web Services

This a talk that I gave at the 2nd Apache Flink meetup in Washington DC Area hosted and sponsored by Capital One on November 19, 2015. You will quickly learn in step-by-step way: How to setup and configure your Apache Flink environment? How to use Apache Flink tools? 3. How to run the examples in the Apache Flink bundle? 4. How to set up your IDE (IntelliJ IDEA or Eclipse) for Apache Flink? 5. How to write your Apache Flink program in an IDE?

Step-by-Step Introduction to Apache Flink

Slim Baltagi

Destaque (14)

Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...

Click-Through Example for Flink’s KafkaConsumer Checkpointing

Gilbert: Declarative Sparse Linear Algebra on Massively Parallel Dataflow Sys...

Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015

Interactive Data Analysis with Apache Flink @ Flink Meetup in Berlin

Introduction to Apache Flink - Fast and reliable big data processing

Streaming Analytics & CEP - Two sides of the same coin?

Juggling with Bits and Bytes - How Apache Flink operates on binary data

High availability and fault tolerance of openstack

Apache Flink Hands On

Machine Learning with Apache Flink at Stockholm Machine Learning Group

Eron Wright - Flink Security Enhancements

Real-Time Streaming Data on AWS

Step-by-Step Introduction to Apache Flink

Semelhante a Fault Tolerance and Job Recovery in Apache Flink @ FlinkForward 2015

Elastic Load Balancing automatically distributes incoming application traffic across multiple Amazon EC2 instances for fault tolerance and load distribution. In this session, we go into detail about Elastic Load Balancing's configuration and day-to-day management, as well as its use in conjunction with Auto Scaling. We explain how to make decisions about the service's many customization choices. We also share best practices and useful tips for success.

(SDD423) Elastic Load Balancing Deep Dive and Best Practices | AWS re:Invent ...

Amazon Web Services

Fault tolerance

Michał Waleszczuk

In the financial industry, losing data is unacceptable. Financial firms are adopting Kafka for their critical applications. Kafka provides the low latency, high throughput, high availability, and scale that these applications require. But can it also provide complete reliability? As a system architect, when asked “Can you guarantee that we will always get every transaction,” you want to be able to say “Yes” with total confidence. In this session, we will go over everything that happens to a message – from producer to consumer, and pinpoint all the places where data can be lost – if you are not careful. You will learn how developers and operation teams can work together to build a bulletproof data pipeline with Kafka. And if you need proof that you built a reliable system – we’ll show you how you can build the system to prove this too.

When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...

confluent

Reactive programming è un paradigma di programmazione basato sulla processazione asincrona di eventi. La sua cresente importanza è confermata dall'introduzione in Java 9 delle Flow API che definiscono un contratto che tutte le librerie di reacrive programming dovranno implementare. Lo scopo di questo talk è chiarire i principi del reactive programming definite dal reactive manifesto e formalizzate dalle Flow API insieme alle feature più avanzate di processazione, trasformazione e combinazione di eventi offerti da RxJava.

Mario Fusco - Reactive programming in Java - Codemotion Milan 2017

Codemotion

Exposing and Fixing Common App Performance Problems

Riverbed Technology

Strata Singapore: GearpumpReal time DAG-Processing with Akka at Scale

Sean Zhong

Flexible and Real-Time Stream Processing with Apache Flink

DataWorks Summit

Trabajar en tiempo real con datos que se mueven muy rápido no es trivial, sobre todo con volúmenes de datos elevados. Apache Flink y Apache BEAM están específicamente diseñadas para ese caso de uso. En esta charla te contaré los retos de la analítica en tiempo real, cuál es la arquitectura de Apache Flink, qué es Apace BEAM, y cómo usan estas herramientas empresas para hacer desde procesos triviales hasta gestionar billones de eventos al día con latencias de milisegundos. Por supuesto, haremos una demo :)

Analitica de datos en tiempo real con Apache Flink y Apache BEAM

javier ramirez

Flink 0.10 - Upcoming Features

Aljoscha Krettek

An introduction to_rac_system_test_planning_methods

Ajith Narayanan

When Web Services Go Bad

Steve Loughran

ETSI NFV#13 NFV resiliency presentation - ali kafel - stratus

Ali Kafel

Elastic Load Balancing automatically distributes incoming application traffic across multiple Amazon EC2 instances for fault tolerance and load distribution. In this session, we go into detail about Elastic Load Balancing's configuration and day-to-day management, as well as its use in conjunction with Auto Scaling. We explain how to make decisions about the service and share best practices and useful tips for success.

(CMP401) Elastic Load Balancing Deep Dive and Best Practices

Amazon Web Services

Amazon EC2 provides a broad selection of instance types to accommodate a diverse mix of workloads. In this session, we provide an overview of the Amazon EC2 instance platform, key platform features, and the concept of instance generations. We dive into the current generation design choices of the different instance families, including General Purpose, Compute Optimized, Storage Optimized, Memory Optimized, and GPU instance. We also detail best practices and share performance tips for getting the most out of your Amazon EC2 instances.

(CMP402) Amazon EC2 Instances Deep Dive

Amazon Web Services

Network and distributed systems

Sri Prasanna

Elastic Load Balancing automatically distributes incoming application traffic across multiple Amazon EC2 instances for fault tolerance and load distribution. In this session, we go into detail about Elastic Load Balancing's configuration and day-to-day management, as well as its use in conjunction with Auto Scaling. We explain how to make decisions about the service and share best practices and useful tips for success.

Deep Dive on Elastic Load Balancing

Amazon Web Services

As we move from monolithic applications to microservices, the ability to colocate workloads offers a tremendous opportunity to realize greater development velocity, robustness, and resource utilization. But workload colocation can also introduce performance variability and affect service levels. Google describes the problem as the “tail at scale”—the amplification of negative results observed at the tail of the latency curve when many systems are involved. With its latest tooling capabilities, Intel has an experiments framework to calculate the trade-offs between low latency and higher density. Niklas Nielsen discusses the challenges and complexities of workload colocation, why solving these challenges matters to your business no matter the size, and how Intel intends to help smarter resource allocations with its latest tooling capabilities and Kubernetes.

Solve the colocation conundrum: Performance and density at scale with Kubernetes

Niklas Quarfot Nielsen

Elastic Load Balancing automatically distributes incoming application traffic across multiple Amazon EC2 instances for fault tolerance and load distribution. In this session, we go into detail about Elastic Load Balancing's configuration and day-to-day management, as well as its use in conjunction with Auto Scaling. We explain how to make decisions about the service and share best practices and useful tips for success.

Deep Dive on Elastic Load Balancing

Amazon Web Services

Software architecture for data applications

Ding Li

Oracle appsloadtestbestpractices

sonusaini69

Semelhante a Fault Tolerance and Job Recovery in Apache Flink @ FlinkForward 2015 (20)

(SDD423) Elastic Load Balancing Deep Dive and Best Practices | AWS re:Invent ...

Fault tolerance

When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...

Mario Fusco - Reactive programming in Java - Codemotion Milan 2017

Exposing and Fixing Common App Performance Problems

Strata Singapore: GearpumpReal time DAG-Processing with Akka at Scale

Flexible and Real-Time Stream Processing with Apache Flink

Analitica de datos en tiempo real con Apache Flink y Apache BEAM

Flink 0.10 - Upcoming Features

An introduction to_rac_system_test_planning_methods

When Web Services Go Bad

ETSI NFV#13 NFV resiliency presentation - ali kafel - stratus

(CMP401) Elastic Load Balancing Deep Dive and Best Practices

(CMP402) Amazon EC2 Instances Deep Dive

Network and distributed systems

Deep Dive on Elastic Load Balancing

Solve the colocation conundrum: Performance and density at scale with Kubernetes

Deep Dive on Elastic Load Balancing

Software architecture for data applications

Oracle appsloadtestbestpractices

Mais de Till Rohrmann

Container technology experiences an ever increasing adoption throughout many industries. Not only does this technology make your applications portable across different machines and operating systems, it also allows to scale applications in a matter of seconds. Moreover, it significantly simplifies and speeds up deployments which decreases development and operation costs. Consequently, more and more Flink deployments run in containerized environments which poses new challenges for Flink. In this talk, we will take a look at Flink's current and future container support which will make it a first class citizen of the container world. First of all, we will explain how the new reactive execution mode will solve the problem of seamless application scaling and how it blends in with any environment. Complementary to the reactive mode, the active execution mode demonstrates its strengths when it comes to changing workloads such as batch jobs. Last but not least, we will take a look beyond Flink's own nose and investigate how Flink can be used together with Kubernetes operators or data Artisans' Application Manager. We will conclude the talk with a short demo of Flink's native Kubernetes support and giving an outlook on future developments in the container realm.

Future of Apache Flink Deployments: Containers, Kubernetes and More - Flink F...

Till Rohrmann

The streaming space is evolving at an ever increasing pace. This trend is also reflected in Apache Flink whose latest major release included again many new features. For streaming practitioners it is essential to learn about Flink's newest capabilities because often they enable completely new use cases and applications. In this talk, I want to give a brief overview about Apache Flink and its latest feature additions, including the integration of CEP with streaming SQL, proper support for state evolution, temporal joins and many more. Furthermore, I want to put them in perspective with respect to Flink's future direction by giving some insights into ongoing development threads in the community. Thereby, I intend to give attendees a better picture about Flink's current and future capabilities.

Apache flink 1.7 and Beyond

Till Rohrmann

One of the big operational challenges when running streaming applications is to cope with varying workloads. Variations, e.g. daily cycles, seasonal spikes or sudden events, require that allocated resources are constantly adapted. Otherwise, service quality deteriorates or money is wasted. Apache Flink 1.5 includes a lot of enhancements to support full resource elasticity on cluster management frameworks such as Apache Mesos. With the latest version, it is now possible to build elastic applications which can be programmatically scaled up or down in order to react to changing workloads. In this talk, we will discuss recent improvements to Flink's deployment model which also enables full resource elasticity. In particular, we will discuss how Flink leverages cluster management frameworks, e.g. Mesos, and already-introduced features like scalable state to support elastic streaming applications. We will conclude the presentation with a short demo showing how a stateful Flink application can be rescaled on top of Mesos.

Elastic Streams at Scale @ Flink Forward 2018 Berlin

Till Rohrmann

Extracting insights out of continuously generated data requires a stream processor with powerful data analytics features such as Apache Flink. A stream data pipeline with Flink typically includes a storage component to ingest and serve the data. Pravega is a stream store that ingests and stores stream data permanently, making the data available for tail, catch-up, and historical reads. One important challenge for such stream data pipelines is coping with the variations in the workload. Daily cycles and seasonal spikes might require the provisioning of the application to adapt accordingly. Pravega has a feature called stream scaling, which enables the capacity offered for the ingestion of events of a stream to grow and shrink over time according to workload. Such a feature is useful when the application downstream has the ability of accommodating such changes and also scale its provisioning accordingly. In this presentation, we introduce stream scaling in Pravega and how Flink jobs leverage this feature to rescale stateful jobs according to variations in the workload.

Scaling stream data pipelines with Pravega and Apache Flink

Till Rohrmann

In our fast moving world it becomes more and more important for companies to gain near real-time insights from their data to make faster decisions. These insights do not only provide a competitve edge over ones rivals but also enable a company to create completely new services and products. Amongst others, predictive user interfaces and online recommendation can be implemented when being able to process large amounts of data in real-time. Apache Flink, one of the most advanced open source distributed stream processing platforms, allows you to extract business intelligence from your data in near real-time. With Apache Flink it is possible to process billions of messages with milliseconds latency. Moreover, its expressive APIs allow you to quickly solve your problems, ranging from classical analytical workloads to distributed event-driven applications. In this talk, I will introduce Apache Flink and explain how it enables users to develop distributed applications and process analytical workloads alike. Starting with Flink’s basic concepts of fault-tolerance, statefulness and event-time aware processing, we will take a look at the different APIs and what they allow us to do. The talk will be concluded by demonstrating how we can use Flink’s higher level abstractions such as FlinkCEP and StreamSQL to do declarative stream processing.

Modern Stream Processing With Apache Flink @ GOTO Berlin 2017

Till Rohrmann

Apache Flink Meets Apache Mesos And DC/OS @ Mesos Meetup Berlin

Till Rohrmann

Apache Flink® Meets Apache Mesos® and DC/OS

Till Rohrmann

With Flink 1.3 being released, the Flink community is already working towards the upcoming release 1.4. Given Flink's high development pace, which manifested in Flink 1.3 being one of the feature-wise biggest releases in its recent history, it becomes more and more difficult to keep track of all development threads. Moreover, it requires more effort to learn about newly added features and which value they provide for your application. In this talk, I want to present and explain some of Flink's latest features, including incremental checkpointing, fine grained recovery, side outputs and many more. Furthermore, I want to put them in perspective with respect to Flink's future direction by giving some insights into ongoing development threads in the community. Thereby, I intend to give attendees a better picture about Flink's current and future capabilities.

From Apache Flink® 1.3 to 1.4

Till Rohrmann

Apache Mesos allows operators to run distributed applications across an entire datacenter and is attracting ever increasing interest. As much as distributed applications see increased use enabled by Mesos, Mesos also sees increasing use due to a growing ecosystem of well-integrated applications. One of the latest additions to the Mesos family is Apache Flink. Flink is one of the most popular open source systems for real-time high scale data processing and allows users to deal with low-latency streaming analytical workloads on Mesos. In this talk, we explain the challenges solved while integrating Flink with Mesos, including how Flink’s distributed architecture can be modeled as a Mesos framework, and how Flink was integrated with Fenzo. Next, we describe how Flink was packaged to easily run on DC/OS.

Apache Flink and More @ MesosCon Asia 2017

Till Rohrmann

Redesigning Apache Flink's Distributed Architecture @ Flink Forward 2017

Till Rohrmann

Mais de Till Rohrmann (10)

Future of Apache Flink Deployments: Containers, Kubernetes and More - Flink F...

Apache flink 1.7 and Beyond

Elastic Streams at Scale @ Flink Forward 2018 Berlin

Scaling stream data pipelines with Pravega and Apache Flink

Modern Stream Processing With Apache Flink @ GOTO Berlin 2017

Apache Flink Meets Apache Mesos And DC/OS @ Mesos Meetup Berlin

Apache Flink® Meets Apache Mesos® and DC/OS

From Apache Flink® 1.3 to 1.4

Apache Flink and More @ MesosCon Asia 2017

Redesigning Apache Flink's Distributed Architecture @ Flink Forward 2017

Último

Axa Assurance Maroc - Insurer Innovation Award 2024

The Digital Insurer

CNv6 Instructor Chapter 6 Quality of Service

giselly40

Scaling API-first – The story of a global engineering organization

Radu Cotescu

Strategies for Landing an Oracle DBA Job as a Fresher

Remote DBA Services

Finology Group – Insurtech Innovation Award 2024

The Digital Insurer

In an era where artificial intelligence (AI) stands at the forefront of business innovation, Information Architecture (IA) is at the core of functionality. See “There’s No AI Without IA” – (from 2016 but even more relevant today) Understanding and leveraging how Information Architecture (IA) supports AI synergies between knowledge engineering and prompt engineering is critical for senior leaders looking to successfully deploy AI for internal and externally facing knowledge processes. This webinar be a high-level overview of the methodologies that can elevate AI-driven knowledge processes supporting both employees and customers. Core Insights Include: Strategic Knowledge Engineering: Delve into how structuring AI's knowledge base is required to prevent hallucinations, enable contextual retrieval of accurate information. This will include discussion of gold standard libraries of use cases support testing various LLMs and structures and configurations of knowledge base. Precision in Prompt Engineering: Learn the art of crafting prompts that direct AI to deliver targeted, relevant responses, thereby optimizing customer experiences and business outcomes. Unified Approach for Enhanced AI Performance: Explore the intersection of knowledge and prompt engineering to develop AI systems that are not only more responsive but also aligned with overarching business strategies. Guiding Principles for Implementation: Equip yourself with best practices, ethical guidelines, and strategic considerations for embedding these technologies into your business ecosystem effectively. This webinar is designed to empower business and technology leaders with the knowledge to harness the full potential of AI, ensuring their organizations not only keep pace with digital transformation but lead the charge. Join us to map a roadmap to fully leverage Information Architecture (IA) and AI chart a course towards a future where AI is a key pillar of strategic innovation and business success.

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx

Earley Information Science

This presentations targets students or working professionals. You may know Google for search, YouTube, Android, Chrome, and Gmail, but did you know Google has many developer tools, platforms & APIs? This comprehensive yet still high-level overview outlines the most impactful tools for where to run your code, store & analyze your data. It will also inspire you as to what's possible. This talk is 50 minutes in length.

Powerful Google developer tools for immediate impact! (2023-24 C)

wesley chun

GenCyber Cyber Security Day Presentation

Michael W. Hawkins

Presentation on how to chat with PDF using ChatGPT code interpreter

naman860154

Handwritten Text Recognition for manuscripts and early printed texts

Maria Levchenko

How to convert PDF to text with Nanonets

naman860154

BooK Now Call us at +918448380779 to hire a gorgeous and seductive call girl for sex. Take a Delhi Escort Service. The help of our escort agency is mostly meant for men who want sexual Indian Escorts In Delhi NCR. It should be noted that any impersonator will get 100 attention from our Young Girls Escorts in Delhi. They will assume the position of reliable allies. VIP Call Girl With Original Photos Book Tonight +918448380779 Our Cheap Price 1 Hour not available 2 Hours 5000 Full Night 8000 TAG: Call Girls in Delhi, Noida, Gurgaon, Ghaziabad, Connaught Place, Greater Kailash Delhi, Lajpat Nagar Delhi, Mayur Vihar Delhi, Chanakyapuri Delhi, New Friends Colony Delhi, Majnu Ka Tilla, Karol Bagh, Malviya Nagar, Saket, Khan Market, Noida Sector 18, Noida Sector 76, Noida Sector 51, Gurgaon Mg Road, Iffco Chowk Gurgaon, Rajiv Chowk Gurgaon All Delhi Ncr Free Home Deliver

08448380779 Call Girls In Friends Colony Women Seeking Men

Delhi Call girls

08448380779 Call Girls In Greater Kailash - I Women Seeking Men

Delhi Call girls

Partners Life - Insurer Innovation Award 2024

The Digital Insurer

08448380779 Call Girls In Civil Lines Women Seeking Men

Delhi Call girls

GenAI Risks & Security Meetup 01052024.pdf

lior mazor

Discord is a free app offering voice, video, and text chat functionalities, primarily catering to the gaming community. It serves as a hub for users to create and join servers tailored to their interests. Discord’s ecosystem comprises servers, each functioning as a distinct online community with its own channels dedicated to specific topics or activities. Users can engage in text-based discussions, voice calls, or video chats within these channels. Understanding Discord Servers Discord servers are virtual spaces where users congregate to interact, share content, and build communities. Servers may revolve around gaming, hobbies, interests, or fandoms, providing a platform for like-minded individuals to connect. Communication Features Discord offers a range of communication tools, including text channels for messaging, voice channels for real-time audio conversations, and video channels for face-to-face interactions. These features facilitate seamless communication and collaboration. What Does NSFW Mean? The acronym NSFW stands for “Not Safe For Work,” indicating content that may be inappropriate for professional or public settings. NSFW Content NSFW content encompasses material that is sexually explicit, violent, or otherwise graphic in nature. It often includes nudity, profanity, or depictions of sensitive topics.

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf

UK Journal

With more memory available, system performance of three Dell devices increased, which can translate to a better user experience Conclusion When your system has plenty of RAM to meet your needs, you can efficiently access the applications and data you need to finish projects and to-do lists without sacrificing time and focus. Our test results show that with more memory available, three Dell PCs delivered better performance and took less time to complete the Procyon Office Productivity benchmark. These advantages translate to users being able to complete workflows more quickly and multitask more easily. Whether you need the mobility of the Latitude 5440, the creative capabilities of the Precision 3470, or the high performance of the OptiPlex Tower Plus 7010, configuring your system with more RAM can help keep processes running smoothly, enabling you to do more without compromising performance.

Boost PC performance: How more available memory can improve productivity

Principled Technologies

As privacy and data protection regulations evolve rapidly, organizations operating in multiple jurisdictions face mounting challenges to ensure compliance and safeguard customer data. With state-specific privacy laws coming up in multiple states this year, it is essential to understand what their unique data protection regulations will require clearly. How will data privacy evolve in the US in 2024? How to stay compliant? Our panellists will guide you through the intricacies of these states' specific data privacy laws, clarifying complex legal frameworks and compliance requirements. This webinar will review: - The essential aspects of each state's privacy landscape and the latest updates - Common compliance challenges faced by organizations operating in multiple states and best practices to achieve regulatory adherence - Valuable insights into potential changes to existing regulations and prepare your organization for the evolving landscape

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments

TrustArc

[2024]Digital Global Overview Report 2024 Meltwater.pdf

hans926745

Fault Tolerance and Job Recovery in Apache Flink @ FlinkForward 2015

1. Fault Tolerance and Job Recovery in Apache Flink™ Till Rohrmann trohrmann@apache.org @stsffap

2. 2

3. Better be safe than sorry §  Failures will happen §  EMC estimated $1.7 billion costs due to data loss and system downtime §  Recovery will save you time and costs §  Switch between algorithms §  Live upgrade of your system 3

4. Fault Tolerance 4

5. Fault tolerance guarantees §  At most once •  No guarantees at all §  At least once •  For many applications sufﬁcient §  Exactly once §  Flink provides all guarantees 5

6. Checkpoints §  Consistent snapshots of distributed data stream and operator state 6

7. Barriers §  Markers for checkpoints §  Injected in the data ﬂow 7

8. 8 §  Alignment for multi-input operators

9. Operator State §  Stateless operators §  System state §  User deﬁned state 9 ds.filter(_ != 0) ds.keyBy(0).window(TumblingTimeWindows.of(5, TimeUnit.SECONDS)) public class CounterSum implements RichReduceFunction<Long> { private OperatorState<Long> counter; @Override public Long reduce(Long v1, Long v2) throws Exception { counter.update(counter.value() + 1); return v1 + v2; } @Override public void open(Configuration config) { counter = getRuntimeContext().getOperatorState(“counter”, 0L, false); } }

10. 10

11. 11

12. 12

13. 13

14. Advantages §  Separation of app logic from recovery •  Checkpointing interval is just a conﬁg parameter §  High throughput •  Controllable checkpointing overhead §  Low impact on latency 14

15. 15

16. Cluster High Availability 16

17. Without high availability 17 JobManager TaskManager

18. With high availability 18 JobManager TaskManager Stand-by JobManager Apache Zookeeper™ KEEP GOING

19. Persisting jobs 19 JobManager Client TaskManagers Apache Zookeeper™ Job 1.  Submit job

20. Persisting jobs 20 JobManager Client TaskManagers Apache Zookeeper™ 1.  Submit job 2.  Persist execuAon graph

21. Persisting jobs 21 JobManager Client TaskManagers Apache Zookeeper™ 1.  Submit job 2.  Persist execuAon graph 3.  Write handle to ZooKeeper

22. Persisting jobs 22 JobManager Client TaskManagers Apache Zookeeper™ 1.  Submit job 2.  Persist execuAon graph 3.  Write handle to ZooKeeper 4.  Deploy tasks

23. Handling checkpoints 23 JobManager Client TaskManagers Apache Zookeeper™ 1.  Take snapshots

24. Handling checkpoints 24 JobManager Client TaskManagers Apache Zookeeper™ 1.  Take snapshots 2.  Persist snapshots 3.  Send handles to JM

25. Handling checkpoints 25 JobManager Client TaskManagers Apache Zookeeper™ 1.  Take snapshots 2.  Persist snapshots 3.  Send handles to JM 4.  Create global checkpoint

26. Handling checkpoints 26 JobManager Client TaskManagers Apache Zookeeper™ 1.  Take snapshots 2.  Persist snapshots 3.  Send handles to JM 4.  Create global checkpoint 5.  Persist global checkpoint

27. Handling checkpoints 27 JobManager Client TaskManagers Apache Zookeeper™ 1.  Take snapshots 2.  Persist snapshots 3.  Send handles to JM 4.  Create global checkpoint 5.  Persist global checkpoint 6.  Write handle to ZooKeeper

28. Conclusion 28

29. 29

30. 30

31. TL;DL §  Job recovery mechanism with low latency and high throughput §  Exactly one processing semantics §  No single point of failure è Flink will always keep processing your data 31

32. ﬂink.apache.org @ApacheFlink

Fault Tolerance and Job Recovery in Apache Flink @ FlinkForward 2015

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (14)

Semelhante a Fault Tolerance and Job Recovery in Apache Flink @ FlinkForward 2015

Semelhante a Fault Tolerance and Job Recovery in Apache Flink @ FlinkForward 2015 (20)

Mais de Till Rohrmann

Mais de Till Rohrmann (10)

Último

Último (20)

Fault Tolerance and Job Recovery in Apache Flink @ FlinkForward 2015