Real time data quality on Flink

•Download as PPTX, PDF•

0 likes•611 views

My use case is to provide monitoring, and improving the overall search data quality, also to find the unusual patterns of user’s search behavior, and notifying the intent on-site back to the respective business stakeholders. To achieve the same, I explored various big data processing engines, which can process the huge data with complex business logic in real time. Eventually, I used Flink Stream processing. This talk will showcase how I used Flink to accomplish my goal.

Data & Analytics

Real Time DQMM on Flink
Jaydeep
Staff Engineer in Search Team
Apache Oozie Committer
June, 2019

Table of Contents
2
• What is Real Time Aggregation?
• Use Case
• What we deal with?
• System Requirements
• Spark vs Flink
• Flink Cluster setup
• Flink on Yarn
• Architecture
• 100% data completeness
• Open Items

What is Real Time Aggregation?
3
• What is real time ?
• What is the processing delay today?
• What real time offering?
• Why do we need it?

Use Case
4
• Bug detection in Response log
• Bot detection
• Best Seller Item
• Item Catalogue health
• Item out of stock (specially on event days)
• Best seller item tracking
• Top query monitoring
• Category performance

What we deal with?
5
~4 Billion logs Per day
~8 million records per minutes
~800 GB Data Per day

System Requirements
6
• Support for Real-time processing.
• Support to track the events.
• Easy to recover from failure.
• Exactly once processing
• Backpressure handling
• Support for Event based, Time based and Dynamic Window
• Highly Available

Spark vs Flink
7
Criteria Spark Flink
Data Processing Mini Batch Stream Processing
Data Shuffling Polling Trigger
Window Function Time Based Time/Event/Custom
Memory Management Configurable Auto Managed
Recovery DAG level State level
Re-Utilization and Iteration By Stage By event

Flink Cluster setup
8
• Standalone
• Flink on Mesos
• Flink on Yarn

100% Data Completeness
11
Event Arrival Time Actual Event Time Clicks
2019-06-01 10:01:00 2019-06-01 10:01:00 3
2019-06-01 10:02:00 2019-06-01 10:02:00 1
2019-06-01 10:04:00 2019-06-01 10:03:00 4
2019-06-01 10:06:00 2019-06-01 10:04:00 5
2019-06-01 10:08:00 2019-06-01 10:04:00 1
Processed Time Event time Window Clicks
2019-06-01 10:05:00 2019-06-01 10:05:00 8
2019-06-01 10:10:00 2019-06-01 10:10:00 6

100% Data Completeness
12
• Event Time data processing
• Handling the delayed event
• Prevent false anomaly detection
• Probability based Model for data completeness

Open Items
13
• Real time Model training
• Handling Seasonality while detecting Anomaly

Walmart Labs – Privileged and Confidential14

What's hot

You may be familiar with the Presto plugin used to run fast interactive queries over Pulsar using ANSI SQL and can be joined with other data sources. This plugin will soon get a rename to align with the rename of the PrestoSQL project to Trino. What is the purpose of this rename and what does it mean for those using the Presto plugin? We cover the history of the community shift from PrestoDB to PrestoSQL, as well as, the future plans for the Pulsar community to donate this plugin to the Trino project. One of the connector maintainers will then demo the connector and show what is possible when using Trino and Pulsar!

Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021

StreamNative

Hive tables are an integral part of the big data ecosystem, but the simple directory-based design that made them ubiquitous is increasingly problematic. Netflix uses tables backed by S3 that, like other object stores, don’t fit this directory-based model: listings are much slower, renames are not atomic, and results are eventually consistent. Even tables in HDFS are problematic at scale, and reliable query behavior requires readers to acquire locks and wait. Owen O’Malley and Ryan Blue offer an overview of Iceberg, a new open source project that defines a new table layout addresses the challenges of current Hive tables, with properties specifically designed for cloud object stores, such as S3. Iceberg is an Apache-licensed open source project. It specifies the portable table format and standardizes many important features, including: * All reads use snapshot isolation without locking. * No directory listings are required for query planning. * Files can be added, removed, or replaced atomically. * Full schema evolution supports changes in the table over time. * Partitioning evolution enables changes to the physical layout without breaking existing queries. * Data files are stored as Avro, ORC, or Parquet. * Support for Spark, Pig, and Presto.

Iceberg: A modern table format for big data (Strata NY 2018)

Ryan Blue

Flink Forward San Francisco 2022. Pinterest is a visual discovery engine that serves over 433MM users. Stream processing allows us to unlock value from realtime data for pinners. At Pinterest, we adopt Flink as the unified streaming processing engine. In this talk, we will share our journey in building a stream processing platform with Flink and how we onboarding critical use cases to the platform. Pinterest has supported 90+near realtime streaming applications. We will cover the problem statement, how we evaluate potential solutions and our decision to build the framework. by Rainie Li & Kanchi Masalia

Flink powered stream processing platform at Pinterest

Flink Forward

Flink Forward San Francisco 2022. Being in the payments space, Stripe requires strict correctness and freshness guarantees. We rely on Flink as the natural solution for delivering on this in support of our Change Data Capture (CDC) infrastructure. We heavily rely on CDC as a tool for capturing data change streams from our databases without critically impacting database reliability, scalability, and maintainability. Data derived from these streams is used broadly across the business and powers many of our critical financial reporting systems totalling over $640 Billion in payment volume annually. We use many components of Flink’s flexible DataStream API to perform aggregations and abstract away the complexities of stream processing from our downstreams. In this talk, we’ll walk through our experience from the very beginning to what we have in production today. We’ll share stories around the technical details and trade-offs we encountered along the way. by Jeff Chao

Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...

Flink Forward

Apache Flink and what it is used for

Aljoscha Krettek

Flink Forward San Francisco 2022. To improve Amazon Alexa experiences and support machine learning inference at scale, we built an automated end-to-end solution for incremental model building or fine-tuning machine learning models through continuous learning, continual learning, and/or semi-supervised active learning. Customer privacy is our top concern at Alexa, and as we build solutions, we face unique challenges when operating at scale such as supporting multiple applications with tens of thousands of transactions per second with several dependencies including near-real time inference endpoints at low latencies. Apache Flink helps us transform and discover metrics in near-real time in our solution. In this talk, we will cover the challenges that we faced, how we scale the infrastructure to meet the needs of ML teams across Alexa, and go into how we enable specific use cases that use Apache Flink on Amazon Kinesis Data Analytics to improve Alexa experiences to delight our customers while preserving their privacy. by Aansh Shah

“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...

Flink Forward

Flink Forward San Francisco 2022. In modern data platform architectures, stream processing engines such as Apache Flink are used to ingest continuous streams of data into data lakes such as Apache Iceberg. Streaming ingestion to iceberg tables can suffer by two problems (1) small files problem that can hurt read performance (2) poor data clustering that can make file pruning less effective. To address those two problems, we propose adding a shuffling stage to the Flink Iceberg streaming writer. The shuffling stage can intelligently group data via bin packing or range partition. This can reduce the number of concurrent files that every task writes. It can also improve data clustering. In this talk, we will explain the motivations in details and dive into the design of the shuffling stage. We will also share the evaluation results that demonstrate the effectiveness of smart shuffling. by Gang Ye & Steven Wu

Tame the small files problem and optimize data layout for streaming ingestion...

Flink Forward

Building an open data platform with apache iceberg

Alluxio, Inc.

Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DB

YugabyteDB

Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/2nZwuQF. Peter Mattis talks about how Cockroach Labs addressed the complexity of distributed databases with CockroachDB. He gives a tour of CockroachDB’s internals, covering the usage of Raft for consensus, the challenges of data distribution, distributed transactions, distributed SQL execution, and distributed SQL optimizations. Filmed at qconnewyork.com. Peter Mattis is the co-founder of Cockroach Labs where he works on a bit of everything, from low-level optimization of code to refining the overall design. He has worked on distributed systems, designing and implementing the original Gmail back-end search and storage system at Google and designing and implementing Colossus, the successor to Google's original distributed file system.

CockroachDB: Architecture of a Geo-Distributed SQL Database

C4Media

Premier Inside-Out: Apache Druid

Hortonworks

Flink Forward San Francisco 2022. This talk will take you on the long journey of Apache Flink into the cloud-native era. It started all the way from where Hadoop and YARN were the standard way of deploying and operating data applications. We're going to deep dive into the cloud-native set of principles and how they map to the Apache Flink internals and recent improvements. We'll cover fast checkpointing, fault tolerance, resource elasticity, minimal infrastructure dependencies, industry-standard tooling, ease of deployment and declarative APIs. After this talk you'll get a broader understanding of the operational requirements for a modern streaming application and where the current limits are. by David Moravek

Apache Flink in the Cloud-Native Era

Flink Forward

During this Big Data Warehousing Meetup, Caserta Concepts and Databricks addressed the number one operational and analytic goal of nearly every organization today – to have complete view of every customer. Customer Data Integration (CDI) must be implemented to cleanse and match customer identities within and across various data systems. CDI has been a long-standing data engineering challenge, not just one of logic and complexity but also of performance and scalability. The speakers brought together best practice techniques with Apache Spark to achieve complete CDI. Speakers: Joe Caserta, President, Caserta Concepts Kevin Rasmussen, Big Data Engineer, Caserta Concepts Vida Ha, Lead Solutions Engineer, Databricks The sessions covered a series of problems that are adequately solved with Apache Spark, as well as those that are require additional technologies to implement correctly. Topics included: · Building an end-to-end CDI pipeline in Apache Spark · What works, what doesn’t, and how do we use Spark we evolve · Innovation with Spark including methods for customer matching from statistical patterns, geolocation, and behavior · Using Pyspark and Python’s rich module ecosystem for data cleansing and standardization matching · Using GraphX for matching and scalable clustering · Analyzing large data files with Spark · Using Spark for ETL on large datasets · Applying Machine Learning & Data Science to large datasets · Connecting BI/Visualization tools to Apache Spark to analyze large datasets internally The speakers also touched on data governance, on-boarding new data rapidly, how to balance rapid agility and time to market with critical decision support and customer interaction. They also shared examples of problems that Apache Spark is not optimized for. For more information on the services offered by Caserta Concepts, visit our website: http://casertaconcepts.com/

Mastering Customer Data on Apache Spark

Caserta

Flink Forward San Francisco 2022. What if my data is already in order? Stream Processing has given us an elegant and powerful solution for running analytic queries and logic over high volumes of continuously arriving data. However, in both Apache Flink and Apache Beam, the notion of time-ordering is baked in at a very low level, making it difficult to express computations that are interested in a semantic-, rather than time-ordering of the data. In financial services, what often matters the most about the data moving between systems is not when the data was created, but in what order, to the extent that many institutions engineer a global sequencing over all data entering and produced by their systems to achieve complete determinism. How, then, can financial institutions and others best employ Stream Processing on streams of data that are already ordered? I will cover various techniques that can make this work, as well as seek input from the community on how Flink might be improved to better support these use-cases. by Patrick Lucas

Processing Semantically-Ordered Streams in Financial Services

Flink Forward

No real-time insight without real-time data ingestion. No real-time data ingestion without NiFi ! Apache NiFi is an integrated platform for data flow management at entreprise level, enabling companies to securely acquire, process and analyze disparate sources of information (sensors, logs, files, etc) in real-time. NiFi helps data engineers accelerate the development of data flows thanks to its UI and a large number of powerful off-the-shelf processors. However, with great power comes great responsibilities. Behind the simplicity of NiFi, best practices must absolutely be respected in order to scale data flows in production & prevent sneaky situations. In this joint presentation, Hortonworks and Renault, a French car manufacturer, will present lessons learnt from real world projects using Apache NiFi. We will present NiFi design patterns to achieve high level performance and reliability at scale as well as the process to put in place around the technology for data flow governance. We will also show how these best practices can be implemented in practical use cases and scenarios. Speakers Kamelia Benchekroun, Data Lake Squad Lead, Renault Group Abdelkrim Hadjidj, Solution Engineer, Hortonworks

Best practices and lessons learnt from Running Apache NiFi at Renault

DataWorks Summit

Untangling Cluster Management with Helix

Amy W. Tang

Berlin Apache Flink Meetup, May 2016 In this talk we present Zalando's microservices architecture and introduce Saiki – our next generation data integration and distribution platform on AWS. We show why we chose Apache Flink to serve as our stream processing framework and describe how we employ it for our current use cases: business process monitoring and continuous ETL. We then have an outlook on future use cases. By Javier Lopez & Mihail Vieru, Zalando, Zalando SE

Flink in Zalando's World of Microservices

Zalando Technology

Airbyte @ Airflow Summit - The new modern data stack

Michel Tricot

At the beginning of 2021, Shopify Data Platform decided to adopt Apache Flink to enable modern stateful stream-processing. Shopify had a lot of experience with other streaming technologies, but Flink was a great fit due to its state management primitives. After about six months, Shopify now has a flourishing ecosystem of tools, tens of prototypes from many teams across the company and a few large use-cases in production. Yaroslav will share a story about not just building a single data pipeline but building a sustainable ecosystem. You can learn about how they planned their platform roadmap, the tools and libraries Shopify built, the decision to fork Flink, and how Shopify partnered with other teams and drove the adoption of streaming at the company.

Apache Flink Adoption at Shopify

Yaroslav Tkachenko

Apache flink

pranay kumar

What's hot (20)

Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021

Iceberg: A modern table format for big data (Strata NY 2018)

Flink powered stream processing platform at Pinterest

Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...

Apache Flink and what it is used for

“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...

Tame the small files problem and optimize data layout for streaming ingestion...

Building an open data platform with apache iceberg

Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DB

CockroachDB: Architecture of a Geo-Distributed SQL Database

Premier Inside-Out: Apache Druid

Apache Flink in the Cloud-Native Era

Mastering Customer Data on Apache Spark

Processing Semantically-Ordered Streams in Financial Services

Best practices and lessons learnt from Running Apache NiFi at Renault

Untangling Cluster Management with Helix

Flink in Zalando's World of Microservices

Airbyte @ Airflow Summit - The new modern data stack

Apache Flink Adoption at Shopify

Apache flink

Similar to Real time data quality on Flink

My use case is to provide monitoring, and improving the overall search data quality, also to find the unusual patterns of user’s search behavior, and notifying the intent on-site back to the respective business stakeholders. To achieve the same, I explored various big data processing engines, which can process huge data with complex business logic in real time. Eventually, I used Flink Stream processing. This talk will showcase how I used Flink to accomplish my goal.

Real time DQMM on Flink

Jaydeep Vishwakarma

My use case is to provide monitoring, and improving the overall search data quality, also to find the unusual patterns of user’s search behavior, and notifying the intent on-site back to the respective business stakeholders. To achieve the same, I explored various big data processing engines, which can process huge data with complex business logic in real time. Eventually, I used Flink Stream processing. This talk will showcase how I used Flink to accomplish my goal.

Real time dqmm on flink

Jaydeep Vishwakarma

Apache Flink® is an open-source stream processing framework for distributed and accurate data streaming applications. An increasing number of IoT use cases will (and some already do) require robust processing frameworks that can handle an ever-increasing amount of data and provide insights in real time. Apache Flink is one of the contenders for the top spot among such frameworks and in this presentation Aljoscha Krettek will highlight some of the properties that make Flink so well suited for IoT use cases: We will first learn what stream processing frameworks in general provide before diving into stateful stream processing and event-time based stream-processing. We will see why these two features are important for IoT scenarios and also why they, together with Flink’s robust handling of failures, enable accurate and robust analytics on real-time streaming data.

Aljoscha Krettek - Apache Flink® and IoT: How Stateful Event-Time Processing ...

Ververica

Stream Processing @ Lyft

Jamie Grier

"The data coming into Kafka is fresh and hot. And you can deliver a new level of operational visibility and intelligence fueling applications with it. But streaming data is no longer real-time when the sink is batch. So the challenge is processing it and analyzing it at scale and extracting those insights - before they go stale. So what’s the right architecture? Should you ingest streams into a data warehouse or data lake? Maybe use a stream processor or a database? Engineering teams love using Apache Flink, but they also love using Apache Druid, a popular real-time analytics database used by 1000s of companies like Confluent and Netflix. Do you need Flink and Druid? When does it make sense vs when does it not? Join this session to learn about Apache Druid and why companies use it in combination with Kafka and Flink for real-time applications. Learn how Apache Druid complements Flink and Kafka - and what makes it purpose-built for analyzing streams and events. This talk shows real-world examples from companies that use Apache Druid with Kafka and Flink in production today and the best-practices that every dev can take advantage of."

A Trifecta of Real-Time Applications: Apache Kafka, Flink, and Druid

HostedbyConfluent

"The shift from batch processing to real-time processing of data is accelerating. Building real-time data applications is a necessity for many businesses as customers expect data to be always up-to-date and their apps to react to changes as they happen. However building and productizing real-time applications is often a complex and lengthy process due to limited serverless options to build such apps. The introduction of AWS lambdas was a watershed moment in the world of cloud computing. It allowed developers to fire up “fully-managed” computer programs while paying for only when the program ran. Serverless compute comes with three big advantages - improved scalability, reduced cost, and increased flexibility. We’re bringing this same powerful paradigm to real time data processing with Flink in Confluent Cloud. Using this model, users can focus on writing business logic instead of managing nodes and other infrastructure. Attendees will learn the benefits of serverless and see how it fits into the context of stream processing. We’ll then kick off a demo where we’ll focus on a real world production use case that uses Flink jobs to power an application with extremely low latency."

Why Serverless Flink Matters - Blazing Fast Stream Processing Made Scalable

HostedbyConfluent

Last week SnapLogic sponsored partner event Splunk Worldwide Users' Conference in Las Vegas. The theme of the conference was "Your Data, No Limits." In keeping with this theme, SnapLogic helps Splunk customers access more comprehensive analytics by integrating as much data as possible from as many sources as possible, and by streamlining the business process of loading data in Splunk, detecting problems, and facilitating actions that result in a prompt resolution. To learn more, visit: http://www.snaplogic.com/.

Webinar: Improve Splunk Analytics and Automate Processes with SnapLogic

SnapLogic

High cardinality data stream processing with large states At Klaviyo, we process more than a billion events daily with spikes as high as 75,000/s on peak days. The workload is growing exponentially year over year. We migrated our legacy event processing pipeline from Python to Flink in 2018 and gained a tremendous amount of performance. At the same time, we reduced our AWS EC2 instances from over 100 nodes to 15. Due to the nature of multi-tenancy and diverse dataset for over a billion user profiles, we spent a lot of engineering effort solving performance bottlenecks specific to handling high cardinality data streams in Flink. In this talk, we would like to share the learnings on using keyed operator states, windowing on high cardinality data, and making Flink production ready. We will also share the journey of moving from a 99% Python culture to Java.

Flink Forward San Francisco 2019: High cardinality data stream processing wit...

Flink Forward

An overview on various concepts used in data stream processing. Most of them are used for solving problems in the field of time, focussing on processing time compared to event time. The techniques shown include the Dataflow API as it was introduced by Google and the concepts of stream and table duality. But I will also come up with other problems like data lookup and deployment of streaming applications and various strategies on solving these problems. In the end I will give a brief outline on the implementation status of those strategies in the popular streaming frameworks Apache Spark Streaming, Apache Flink and Kafka Streams.

Data Stream Processing - Concepts and Frameworks

Matthias Niehoff

Growing into a proactive Data Platform

LivePerson

What is it about? In-Stream Event Processing is a new approach for building near real time big data systems with rapidly growing user base and applications like clickstream analytics, preventive maintenance or fraud detection. Maturity of some open source projects enables building an enterprise grade In-Stream Processing service in-house. However the open source world comprises of many competing projects of different maturity, having different perspectives so the task to select effective and efficient projects is not straightforward. In the talk I’ll present a blueprint of an In-Stream Processing Service, enterprise grade reliable and scalable, cloud ready, build from 100% open source components.

In-Stream Processing Service Blueprint, Reference architecture for real-time ...

Grid Dynamics

Splunk, an industry leader in IT operations and security analytics, is moving to the cloud. Adopting Splunk in the cloud can help you make better, faster decisions with real-time visibility across the enterprise. That said, if your critical business services rely on the IBM Z or IBM i, including these systems is a must in your new Splunk environment. Having these systems in your Splunk environment helps remove a significant blind spot in your modernization efforts - avoiding security risks, failed audits, downtime, and escalating costs. Join this discussion with presenters Brady Moyer from Splunk and Ian Hartley from Precisely to learn how to seamlessly integrate IBM Z and IBM i into Splunk for a true enterprise-wide view of your IT landscape. During this on-demand webinar, you will hear: • How Precisely Ironstream provides integration with Splunk without the need for mainframe or IBM i expertise • The different types of data that can be collected and forwarded to Splunk • Example use cases for events, security, and performance data

How Precisely and Splunk Can Help You Better Manage Your IBM Z and IBM i Envi...

Precisely

TTPs for Threat hunting In Oil Refineries

Dragos, Inc.

Apache Flink is a popular stream computing framework for real-time stream computing. Many stream compute algorithms require trailing data in order to compute the intended result. One example is computing the number of user logins in the last 7 days. This creates a dilemma where the results of the stream program are incomplete until the runtime of the program exceeds 7 days. The alternative is to bootstrap the program using historic data to seed the state before shifting to use real-time data. This talk will discuss alternatives to bootstrap programs in Flink. Some alternatives rely on technologies exogenous to the stream program, such as enhancements to the pub/sub layer, that are more generally applicable to other stream compute engines. Other alternatives include enhancements to Flink source implementations. Lyft is exploring another alternative using orchestration of multiple Flink programs. The talk will cover why Lyft pursued this alternative and future directions to further enhance bootstrapping support in Flink. Speaker Gregory Fee, Principal Engineer, Lyft

Bootstrapping state in Apache Flink

DataWorks Summit

DataEngConf SF16 - Scalable and Reliable Logging at Pinterest

Hakka Labs

At Pinterest, hundreds of services and third-party tools that are implemented in various programming languages generate billions of events every day. To achieve scalable and reliable low latency logging, there are several challenges: (1) uploading logs that are generated in various formats from tens of thousands of hosts to Kafka in a timely manner; (2) running Kafka reliably on Amazon Web Services where the virtual instances are less reliable than on-premises hardware; (3) moving tens of terabytes data per day from Kafka to cloud storage reliably and efficiently, and guaranteeing exact one time persistence per message. In this talk, we will present Pinterest’s logging pipeline, and share our experience addressing these challenges. We will dive deep into the three components we developed: data uploading from service hosts to Kafka, data transportation from Kafka to S3, and data sanitization. We will also share our experience in operating Kafka at scale in the cloud.

Scalable and Reliable Logging at Pinterest

Krishna Gade

Microsoft Cloud App Security provides organizations with enterprise grade protection to cloud applications. One of the main capabilities of CAS is the real time detection of threats like compromised accounts, insider threat and ransomware, based on abnormal user activity. In this talk we will describe our search for a right stateful streaming platform to empower our detections engine, the reasons that led us to choose Flink, and the architecture we built on top of Flink. We will share details about the challenges of constructing a complex job with multiple levels of statistical analysis, custom windowing, and inline machine learning model updating. We will also share our experience running Flink in Azure and connecting it to our production eco-system.

Flink Forward Berlin 2018: Yonatan Most & Avihai Berkovitz - "Anomaly Detecti...

Flink Forward

Abstractions for managed stream processing platform (Arya Ketan - Flipkart)

KafkaZone

Real time monitoring of hadoop and spark workflows

Shankar Manian

Morningstar’s Risk Model project is created by stitching together statistical and machine learning models to produce risk and performance metrics for millions of financial securities. Previously, we were running a single version of this application, but needed to expand it to allow for customizations based on client demand. With the goal of running hundreds of custom Risk Model runs at once at an output size of around 1TB of data each, we had a challenging technical problem on our hands! In this presentation, we’ll talk about the challenges we faced replatforming this application to Spark, how we solved them, and the benefits we saw. Some things we’ll touch on include how we created customized models, the architecture of our machine learning application, how we maintain an audit trail of data transformations (for rigorous third party audits), and how we validate the input data our model takes in and output data our model produces. We want the attendees to walk away with some key ideas of what worked for us when productizing a large scale machine learning platform.

Lessons Learned Replatforming A Large Machine Learning Application To Apache ...

Databricks

Similar to Real time data quality on Flink (20)

Real time DQMM on Flink

Real time dqmm on flink

Aljoscha Krettek - Apache Flink® and IoT: How Stateful Event-Time Processing ...

Stream Processing @ Lyft

A Trifecta of Real-Time Applications: Apache Kafka, Flink, and Druid

Why Serverless Flink Matters - Blazing Fast Stream Processing Made Scalable

Webinar: Improve Splunk Analytics and Automate Processes with SnapLogic

Flink Forward San Francisco 2019: High cardinality data stream processing wit...

Data Stream Processing - Concepts and Frameworks

Growing into a proactive Data Platform

In-Stream Processing Service Blueprint, Reference architecture for real-time ...

How Precisely and Splunk Can Help You Better Manage Your IBM Z and IBM i Envi...

TTPs for Threat hunting In Oil Refineries

Bootstrapping state in Apache Flink

DataEngConf SF16 - Scalable and Reliable Logging at Pinterest

Scalable and Reliable Logging at Pinterest

Flink Forward Berlin 2018: Yonatan Most & Avihai Berkovitz - "Anomaly Detecti...

Abstractions for managed stream processing platform (Arya Ketan - Flipkart)

Real time monitoring of hadoop and spark workflows

Lessons Learned Replatforming A Large Machine Learning Application To Apache ...

Recently uploaded

Ashok Vihar Call Girls in Delhi (–9953330565) Escort Service In Delhi NCR PROVIDE 100% REAL GIRLS ALL ARE GIRLS LOOKING MODELS AND RAM MODELS ALL GIRLS” INDIAN , RUSSIAN ,KASMARI ,PUNJABI HOT GIRLS AND MATURED HOUSE WIFE BOOKING ONLY DECENT GUYS AND GENTLEMAN NO FAKE PERSON FREE HOME SERVICE IN CALL FULL AC ROOM SERVICE IN SOUTH DELHI Ultimate Destination for finding a High Profile Independent Escorts in Delhi.Gurgaon.Noida..!.Like You Feel 100% Real Girl Friend Experience. We are High Class Delhi Escort Agency offering quality services with discretion. We only offer services to gentlemen people. We have lots of girls working with us like students, Russian, models, house wife, and much More We Provide Short Time and Full Night Service Call ☎☎+91–9953330565 ❤꧂ • In Call and Out Call Service in Delhi NCR • 3* 5* 7* Hotels Service in Delhi NCR • 24 Hours Available in Delhi NCR • Indian, Russian, Punjabi, Kashmiri Escorts • Real Models, College Girls, House Wife, Also Available • Short Time and Full Time Service Available • Hygienic Full AC Neat and Clean Rooms Avail. In Hotel 24 hours • Daily New Escorts Staff Available • Minimum to Maximum Range Available. Location;- Delhi, Gurgaon, NCR, Noida, and All Over in Delhi Hotel and Home Services HOTEL SERVICE AVAILABLE :-REDDISSON BLU,ITC WELCOM DWARKA,HOTEL-JW MERRIOTT,HOLIDAY INN MAHIPALPUR AIROCTY,CROWNE PLAZA OKHALA,EROSH NEHRU PLACE,SURYAA KALKAJI,CROWEN PLAZA ROHINI,SHERATON PAHARGANJ,THE AMBIENC,VIVANTA,SURAJKUND,ASHOKA CONTINENTAL , LEELA CHANKYAPURI,_ALL 3* 5* 7* STARTS HOTEL SERVICE BOOKING CALL Call WHATSAPP Call ☎+91–9953330565❤꧂ NIGHT SHORT TIME BOTH ARE AVAILABLE

Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service

9953056974 Low Rate Call Girls In Saket, Delhi NCR

This project aims to predict whether a loan application will be approved or denied based on various factors such as applicant's income, credit score, loan amount, etc. Using a dataset containing historical loan application data, we employed machine learning algorithms to build a predictive model. The model was trained on features such as applicant's income, credit history, loan amount, loan term, and others. After training the model, we evaluated its performance using metrics like accuracy, precision, recall, and F1 score. The insights from this project can help financial institutions streamline their loan approval process and make informed decisions. Visit for more information: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/

Predicting Loan Approval: A Data Science Project

Boston Institute of Analytics

Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore Booking Contact Details :- WhatsApp Chat :- +91-7737669865 2-May-2024(SMW) Call Girls In Model Towh Bangalore +91-7737669865 !! Best Woman Seeking Man Call Girls Service, Escorts Service in Home Hotel in Bangalore NCR 24 Hours Available Service Call Girls, Contact Us +91-7737669865 (Any Time. Any Where) Call Girls in Bangalore, Noida, Gurgaon, Ghaziabad,Sexy Indian Female Escorts Service Bangalore NCRWelcome To Bangalore Escorts Service – An All Over New Bangalore Very Sexy Hot Call Girls Agency Service Escorts In South BangaloreNCRBangalore’s No. 1 High Profile Independent Female Escorts Service. We Provide Good Quality Educated Profile At Very Regnebal Price 100% Safe And Original.We Are Provide Escorts Service All OYO Hotels ,3*,4*,5* Star Hotel And Home Flat, Apartment. Guest-House. Services In -Call And Out – Call Both Are Services Available. 24Hrs. Any Time Any Where. In All Over Bangalore Noida Gurgaon Ghaziabad Faridabad.More Information And Contact Profile Real Pic Visit Our Website City Wise Escorts Service Agency.Good Looking Cheap And Best Models Girls U Can Get Best Click On Link……Night Call Girls Now In Hotel Le Meridien Gurgaon Near Female Escort One Shot — 5000/in call (time 1 hour), 6000/out call Two shot with one girl — 8000/in call (time 2 hour), 10000/out call Body to body massage with sex- 8000/in call (time 1 hour) Full night Service for one person– 12000/in call, 13000/out call (shot limit 3-4 shots) Full night Service for more than 1 person — please contact Us —7737669865 We are available 24*7 all days of the year. Call us — 7737669865 Thank you for Visiting.

Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...

amitlee9823

Model Call Girl Services in Delhi reach out to us at 🔝 9953056974 🔝✔️✔️ Our agency presents a selection of young, charming call girls available for bookings at Oyo Hotels. Experience high-class escort services at pocket-friendly rates, with our female escorts exuding both beauty and a delightful personality, ready to meet your desires. Whether it's Housewives, College girls, Russian girls, Muslim girls, or any other preference, we offer a diverse range of options to cater to your tastes. We provide both in-call and out-call services for your convenience. Our in-call location in Delhi ensures cleanliness, hygiene, and 100% safety, while our out-call services offer doorstep delivery for added ease. We value your time and money, hence we kindly request pic collectors, time-passers, and bargain hunters to refrain from contacting us. Our services feature various packages at competitive rates: One shot: ₹2000/in-call, ₹5000/out-call Two shots with one girl: ₹3500/in-call, ₹6000/out-call Body to body massage with sex: ₹3000/in-call Full night for one person: ₹7000/in-call, ₹10000/out-call Full night for more than 1 person: Contact us at 🔝 9953056974 🔝. for details Operating 24/7, we serve various locations in Delhi, including Green Park, Lajpat Nagar, Saket, and Hauz Khas near metro stations. For premium call girl services in Delhi 🔝 9953056974 🔝. Thank you for considering us!

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE

9953056974 Low Rate Call Girls In Saket, Delhi NCR

Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore Booking Contact Details :- WhatsApp Chat :- +91-7737669865 2-May-2024(SMW) Call Girls In Model Towh Bangalore +91-7737669865 !! Best Woman Seeking Man Call Girls Service, Escorts Service in Home Hotel in Bangalore NCR 24 Hours Available Service Call Girls, Contact Us +91-7737669865 (Any Time. Any Where) Call Girls in Bangalore, Noida, Gurgaon, Ghaziabad,Sexy Indian Female Escorts Service Bangalore NCRWelcome To Bangalore Escorts Service – An All Over New Bangalore Very Sexy Hot Call Girls Agency Service Escorts In South BangaloreNCRBangalore’s No. 1 High Profile Independent Female Escorts Service. We Provide Good Quality Educated Profile At Very Regnebal Price 100% Safe And Original.We Are Provide Escorts Service All OYO Hotels ,3*,4*,5* Star Hotel And Home Flat, Apartment. Guest-House. Services In -Call And Out – Call Both Are Services Available. 24Hrs. Any Time Any Where. In All Over Bangalore Noida Gurgaon Ghaziabad Faridabad.More Information And Contact Profile Real Pic Visit Our Website City Wise Escorts Service Agency.Good Looking Cheap And Best Models Girls U Can Get Best Click On Link……Night Call Girls Now In Hotel Le Meridien Gurgaon Near Female Escort One Shot — 5000/in call (time 1 hour), 6000/out call Two shot with one girl — 8000/in call (time 2 hour), 10000/out call Body to body massage with sex- 8000/in call (time 1 hour) Full night Service for one person– 12000/in call, 13000/out call (shot limit 3-4 shots) Full night Service for more than 1 person — please contact Us —7737669865 We are available 24*7 all days of the year. Call us — 7737669865 Thank you for Visiting.

Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...

amitlee9823

Anomaly detection and data imputation within time series

Paris Women in Machine Learning and Data Science

(NEHA) Call Girls Katra Call Now: 8617697112 Katra Escorts Booking Contact Details WhatsApp Chat: +91-8617697112 Katra Escort Service includes providing maximum physical satisfaction to their clients as well as engaging conversation that keeps your time enjoyable and entertaining. Plus, they look fabulously elegant, making an impression. Independent Escorts Katra understands the value of confidentiality and discretion; they will go the extra mile to meet your needs. Simply contact them via text messaging or through their online profiles; they'd be more than delighted to accommodate any request or arrange a romantic date or fun-filled night together. We provide:

(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7

Call Girls in Nagpur High Profile Call Girls

Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore Escorts Service Booking Contact Details :- WhatsApp Chat :- +91-7737669865 4-May-2024(SMW) Call Girls In Model Towh +91-7737669865 !! Best Woman Seeking Man Call Girls Service, Escorts Service in Home Hotel in NCR 24 Hours Available Service Call Girls, Contact Us +91-7737669865 (Any Time. Any Where) Call Girls in , Noida, Gurgaon, Ghaziabad,Sexy Indian Female Escorts Service NCRWelcome To Escorts Service – An All Over New Very Sexy Hot Call Girls Agency Service Escorts In South NCR’s No. 1 High Profile Independent Female Escorts Service. We Provide Good Quality Educated Profile At Very Regnebal Price 100% Safe And Original.We Are Provide Escorts Service All OYO Hotels ,3*,4*,5* Star Hotel And Home Flat, Apartment. Guest-House. Services In -Call And Out – Call Both Are Services Available. 24Hrs. Any Time Any Where. In All Over Noida Gurgaon Ghaziabad Faridabad.More Information And Contact Profile Real Pic Visit Our Website City Wise Escorts Service Agency.Good Looking Cheap And Best Models Girls U Can Get Best Click On Link……Night Call Girls Now In Hotel Le Meridien Gurgaon Near Female Escort One Shot — 5000/in call (time 1 hour), 6000/out call Two shot with one girl — 8000/in call (time 2 hour), 10000/out call Body to body massage with sex- 8000/in call (time 1 hour) Full night Service for one person– 12000/in call, 13000/out call (shot limit 3-4 shots) Full night Service for more than 1 person — please contact Us —7737669865 We are available 24*7 all days of the year. Call us — 7737669865 Thank you for Visiting.

Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...

amitlee9823

Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore Booking Contact Details :- WhatsApp Chat :- +91-7737669865 2-May-2024(SMW) Call Girls In Model Towh Bangalore +91-7737669865 !! Best Woman Seeking Man Call Girls Service, Escorts Service in Home Hotel in Bangalore NCR 24 Hours Available Service Call Girls, Contact Us +91-7737669865 (Any Time. Any Where) Call Girls in Bangalore, Noida, Gurgaon, Ghaziabad,Sexy Indian Female Escorts Service Bangalore NCRWelcome To Bangalore Escorts Service – An All Over New Bangalore Very Sexy Hot Call Girls Agency Service Escorts In South BangaloreNCRBangalore’s No. 1 High Profile Independent Female Escorts Service. We Provide Good Quality Educated Profile At Very Regnebal Price 100% Safe And Original.We Are Provide Escorts Service All OYO Hotels ,3*,4*,5* Star Hotel And Home Flat, Apartment. Guest-House. Services In -Call And Out – Call Both Are Services Available. 24Hrs. Any Time Any Where. In All Over Bangalore Noida Gurgaon Ghaziabad Faridabad.More Information And Contact Profile Real Pic Visit Our Website City Wise Escorts Service Agency.Good Looking Cheap And Best Models Girls U Can Get Best Click On Link……Night Call Girls Now In Hotel Le Meridien Gurgaon Near Female Escort One Shot — 5000/in call (time 1 hour), 6000/out call Two shot with one girl — 8000/in call (time 2 hour), 10000/out call Body to body massage with sex- 8000/in call (time 1 hour) Full night Service for one person– 12000/in call, 13000/out call (shot limit 3-4 shots) Full night Service for more than 1 person — please contact Us —7737669865 We are available 24*7 all days of the year. Call us — 7737669865 Thank you for Visiting.

Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...

amitlee9823

VidaXL dropshipping via API with DroFx.pptx

olyaivanovalion

Accredited-Transport-Cooperatives-Jan-2021-Web.pdf

adriantubila

BigBuy dropshipping via API with DroFx.pptx

olyaivanovalion

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore Escorts Service Booking Contact Details :- WhatsApp Chat :- +91-7737669865 2-May-2024(SMW) Call Girls In Model Towh Bangalore +91-7737669865 !! Best Woman Seeking Man Call Girls Service, Escorts Service in Home Hotel in Bangalore NCR 24 Hours Available Service Call Girls, Contact Us +91-7737669865 (Any Time. Any Where) Call Girls in Bangalore, Noida, Gurgaon, Ghaziabad,Sexy Indian Female Escorts Service Bangalore NCRWelcome To Bangalore Escorts Service – An All Over New Bangalore Very Sexy Hot Call Girls Agency Service Escorts In South BangaloreNCRBangalore’s No. 1 High Profile Independent Female Escorts Service. We Provide Good Quality Educated Profile At Very Regnebal Price 100% Safe And Original.We Are Provide Escorts Service All OYO Hotels ,3*,4*,5* Star Hotel And Home Flat, Apartment. Guest-House. Services In -Call And Out – Call Both Are Services Available. 24Hrs. Any Time Any Where. In All Over Bangalore Noida Gurgaon Ghaziabad Faridabad.More Information And Contact Profile Real Pic Visit Our Website City Wise Escorts Service Agency.Good Looking Cheap And Best Models Girls U Can Get Best Click On Link……Night Call Girls Now In Hotel Le Meridien Gurgaon Near Female Escort One Shot — 5000/in call (time 1 hour), 6000/out call Two shot with one girl — 8000/in call (time 2 hour), 10000/out call Body to body massage with sex- 8000/in call (time 1 hour) Full night Service for one person– 12000/in call, 13000/out call (shot limit 3-4 shots) Full night Service for more than 1 person — please contact Us —7737669865 We are available 24*7 all days of the year. Call us — 7737669865 Thank you for Visiting.

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...

amitlee9823

Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage with sex Booking Contact Details :- WhatsApp Chat :- +91-9920725232 4-May-2024(SMW) Call Girls In Model Towh +91-9920725232 !! Best Woman Seeking Man Call Girls Service, Escorts Service in Home Hotel in NCR 24 Hours Available Service Call Girls, Contact Us +91-9920725232 (Any Time. Any Where) Call Girls in , Noida, Gurgaon, Ghaziabad,Sexy Indian Female Escorts Service NCRWelcome To Escorts Service – An All Over New Very Sexy Hot Call Girls Agency Service Escorts In South NCR’s No. 1 High Profile Independent Female Escorts Service. We Provide Good Quality Educated Profile At Very Regnebal Price 100% Safe And Original.We Are Provide Escorts Service All OYO Hotels ,3*,4*,5* Star Hotel And Home Flat, Apartment. Guest-House. Services In -Call And Out – Call Both Are Services Available. 24Hrs. Any Time Any Where. In All Over Noida Gurgaon Ghaziabad Faridabad.More Information And Contact Profile Real Pic Visit Our Website City Wise Escorts Service Agency.Good Looking Cheap And Best Models Girls U Can Get Best Click On Link……Night Call Girls Now In Hotel Le Meridien Gurgaon Near Female Escort One Shot — 5000/in call (time 1 hour), 6000/out call Two shot with one girl — 8000/in call (time 2 hour), 10000/out call Body to body massage with sex- 8000/in call (time 1 hour) Full night Service for one person– 12000/in call, 13000/out call (shot limit 3-4 shots) Full night Service for more than 1 person — please contact Us —9920725232 We are available 24*7 all days of the year. Call us — 9920725232 Thank you for Visiting.

Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...

amitlee9823

Edukaciniai dropshipping via API with DroFx

olyaivanovalion

Halmar dropshipping via API with DroFx

olyaivanovalion

Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand Booking Contact Details :- WhatsApp Chat :- +91-7737669865 Call Girls In Model Towh +91-7737669865 !! Best Woman Seeking Man Call Girls Service, Escorts Service in Home Hotel in NCR 24 Hours Available Service Call Girls, Contact Us +91-7737669865 (Any Time. Any Where) Call Girls in , Noida, Gurgaon, Ghaziabad,Sexy Indian Female Escorts Service NCRWelcome To Escorts Service – An All Over New Very Sexy Hot Call Girls Agency Service Escorts In South NCR’s No. 1 High Profile Independent Female Escorts Service. We Provide Good Quality Educated Profile At #K09 Very Regnebal Price 100% Safe And Original.We Are Provide Escorts Service All OYO Hotels ,3*,4*,5* Star Hotel And Home Flat, Apartment. Guest-House. Services In -Call And Out – Call Both Are Services Available. 24Hrs. Any Time Any Where. In All Over Noida Gurgaon Ghaziabad Faridabad.More Information And Contact Profile Real Pic Visit Our Website City Wise Escorts Service Agency.Good Looking Cheap And Best Models Girls U Can Get Best Click On Link……Night Call Girls Now In Hotel Le Meridien Gurgaon Near Female Escort One Shot — 5000/in call (time 1 hour), 6000/out call Two shot with one girl — 8000/in call (time 2 hour), 10000/out call Body to body massage with sex- 8000/in call (time 1 hour) Full night Service for one person– 12000/in call, 13000/out call (shot limit 3-4 shots) Full night Service for more than 1 person — please contact Us —7737669865 We are available 24*7 all days of the year. Call us — 7737669865 Thank you for Visiting.

Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand

amitlee9823

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore Escorts Service Booking Contact Details :- WhatsApp Chat :- +91-7737669865 2-May-2024(SMW) Call Girls In Model Towh Bangalore +91-7737669865 !! Best Woman Seeking Man Call Girls Service, Escorts Service in Home Hotel in Bangalore NCR 24 Hours Available Service Call Girls, Contact Us +91-7737669865 (Any Time. Any Where) Call Girls in Bangalore, Noida, Gurgaon, Ghaziabad,Sexy Indian Female Escorts Service Bangalore NCRWelcome To Bangalore Escorts Service – An All Over New Bangalore Very Sexy Hot Call Girls Agency Service Escorts In South BangaloreNCRBangalore’s No. 1 High Profile Independent Female Escorts Service. We Provide Good Quality Educated Profile At Very Regnebal Price 100% Safe And Original.We Are Provide Escorts Service All OYO Hotels ,3*,4*,5* Star Hotel And Home Flat, Apartment. Guest-House. Services In -Call And Out – Call Both Are Services Available. 24Hrs. Any Time Any Where. In All Over Bangalore Noida Gurgaon Ghaziabad Faridabad.More Information And Contact Profile Real Pic Visit Our Website City Wise Escorts Service Agency.Good Looking Cheap And Best Models Girls U Can Get Best Click On Link……Night Call Girls Now In Hotel Le Meridien Gurgaon Near Female Escort One Shot — 5000/in call (time 1 hour), 6000/out call Two shot with one girl — 8000/in call (time 2 hour), 10000/out call Body to body massage with sex- 8000/in call (time 1 hour) Full night Service for one person– 12000/in call, 13000/out call (shot limit 3-4 shots) Full night Service for more than 1 person — please contact Us —7737669865 We are available 24*7 all days of the year. Call us — 7737669865 Thank you for Visiting.

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...

amitlee9823

Sampling (random) method and Non random.ppt

Dr. Soumendra Kumar Patra

Midocean dropshipping via API with DroFx

olyaivanovalion

Recently uploaded (20)

Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service

Predicting Loan Approval: A Data Science Project

Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE

Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...

Anomaly detection and data imputation within time series

(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7

Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...

Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...

VidaXL dropshipping via API with DroFx.pptx

Accredited-Transport-Cooperatives-Jan-2021-Web.pdf

BigBuy dropshipping via API with DroFx.pptx

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...

Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...

Edukaciniai dropshipping via API with DroFx

Halmar dropshipping via API with DroFx

Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...

Sampling (random) method and Non random.ppt

Midocean dropshipping via API with DroFx

Real time data quality on Flink

1. Real Time DQMM on Flink Jaydeep Staff Engineer in Search Team Apache Oozie Committer June, 2019

2. Table of Contents 2 • What is Real Time Aggregation? • Use Case • What we deal with? • System Requirements • Spark vs Flink • Flink Cluster setup • Flink on Yarn • Architecture • 100% data completeness • Open Items

3. What is Real Time Aggregation? 3 • What is real time ? • What is the processing delay today? • What real time offering? • Why do we need it?

4. Use Case 4 • Bug detection in Response log • Bot detection • Best Seller Item • Item Catalogue health • Item out of stock (specially on event days) • Best seller item tracking • Top query monitoring • Category performance

5. What we deal with? 5 ~4 Billion logs Per day ~8 million records per minutes ~800 GB Data Per day

6. System Requirements 6 • Support for Real-time processing. • Support to track the events. • Easy to recover from failure. • Exactly once processing • Backpressure handling • Support for Event based, Time based and Dynamic Window • Highly Available

7. Spark vs Flink 7 Criteria Spark Flink Data Processing Mini Batch Stream Processing Data Shuffling Polling Trigger Window Function Time Based Time/Event/Custom Memory Management Configurable Auto Managed Recovery DAG level State level Re-Utilization and Iteration By Stage By event

8. Flink Cluster setup 8 • Standalone • Flink on Mesos • Flink on Yarn

9. Flink on Yarn 9

10. Architecture 10

11. 100% Data Completeness 11 Event Arrival Time Actual Event Time Clicks 2019-06-01 10:01:00 2019-06-01 10:01:00 3 2019-06-01 10:02:00 2019-06-01 10:02:00 1 2019-06-01 10:04:00 2019-06-01 10:03:00 4 2019-06-01 10:06:00 2019-06-01 10:04:00 5 2019-06-01 10:08:00 2019-06-01 10:04:00 1 Processed Time Event time Window Clicks 2019-06-01 10:05:00 2019-06-01 10:05:00 8 2019-06-01 10:10:00 2019-06-01 10:10:00 6

12. 100% Data Completeness 12 • Event Time data processing • Handling the delayed event • Prevent false anomaly detection • Probability based Model for data completeness

13. Open Items 13 • Real time Model training • Handling Seasonality while detecting Anomaly

14. Walmart Labs – Privileged and Confidential14

Real time data quality on Flink

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Real time data quality on Flink

Similar to Real time data quality on Flink (20)

Recently uploaded

Recently uploaded (20)

Real time data quality on Flink