Apache Kafka at LinkedIn

•Transferir como PPTX, PDF•

14 gostaram•4,632 visualizações

Jay Kreps is a Principal Staff Engineer at LinkedIn where he is the lead architect for online data infrastructure. He is among the original authors of several open source projects including a distributed key-value store called Project Voldemort, a messaging system called Kafka, and a stream processing system called Samza. This talk gives an introduction to Apache Kafka, a distributed messaging system. It will cover both how Kafka works, as well as how it is used at LinkedIn for log aggregation, messaging, ETL, and real-time stream processing.

Engenharia

The Plan
1. What is Apache Kafka?
2. Kafka and Data Integration
3. Kafka and Stream Processing

Characteristics
• Scalability of a filesystem
– Hundreds of MB/sec/server throughput
– Many TB per server
• Guarantees of a database
– Messages strictly ordered
– All data persistent
• Distributed by default
– Replication
– Partitioning model

Kafka At LinkedIn
• 175 TB of in-flight log data per colo
• Replicated to each datacenter
• Tens of thousands of data producers
• Thousands of consumers
• 7 million messages written/sec
• 35 million messages read/sec
• Hadoop integration

Performance
• Producer (3x replication):
– Async: 786,980 records/sec (75.1 MB/sec)
– Sync: 421,823 records/sec (40.2 MB/sec)
• Consumer:
– 940,521 records/sec (89.7 MB/sec)
• End-to-end latency:
– 2 ms (median)
– 14 ms (99.9th percentile)

New Types of Data
• Database data
– Users, products, orders, etc
• Events
– Clicks, Impressions, Pageviews, etc
• Application metrics
– CPU usage, requests/sec
• Application logs
– Service calls, errors

New Types of Systems
• Live Stores
– Voldemort
– Espresso
– Graph
– OLAP
– Search
– InGraphs
• Offline
– Hadoop
– Teradata

Stream processing is a
generalization
of batch processing

Examples
• Monitoring
• Security
• Content processing
• Recommendations
• Newsfeed
• ETL

Kafka
http://kafka.apache.org
Samza
http://samza.incubator.apache.org
Log Blog
http://linkd.in/199iMwY
Benchmark:
http://t.co/40fkKJvanx
Me
http://www.linkedin.com/in/jaykreps
@jaykreps

Mais conteúdo relacionado

Mais procurados

Apache kafka

Kumar Shivam

Apache Kafka Fundamentals for Architects, Admins and Developers

confluent

Apache Kafka - Martin Podval

Martin Podval

Kafka 101

Clement Demonchy

Kafka Streams is a new stream processing library natively integrated with Kafka. It has a very low barrier to entry, easy operationalization, and a natural DSL for writing stream processing applications. As such it is the most convenient yet scalable option to analyze, transform, or otherwise process data that is backed by Kafka. We will provide the audience with an overview of Kafka Streams including its design and API, typical use cases, code examples, and an outlook of its upcoming roadmap. We will also compare Kafka Streams' light-weight library approach with heavier, framework-based tools such as Spark Streaming or Storm, which require you to understand and operate a whole different infrastructure for processing real-time data in Kafka.

Introduction to Kafka Streams

Guozhang Wang

Apache Kafka becoming the message bus to transfer huge volumes of data from various sources into Hadoop. It's also enabling many real-time system frameworks and use cases. Managing and building clients around Apache Kafka can be challenging. In this talk, we will go through the best practices in deploying Apache Kafka in production. How to Secure a Kafka Cluster, How to pick topic-partitions and upgrading to newer versions. Migrating to new Kafka Producer and Consumer API. Also talk about the best practices involved in running a producer/consumer. In Kafka 0.9 release, we’ve added SSL wire encryption, SASL/Kerberos for user authentication, and pluggable authorization. Now Kafka allows authentication of users, access control on who can read and write to a Kafka topic. Apache Ranger also uses pluggable authorization mechanism to centralize security for Kafka and other Hadoop ecosystem projects. We will showcase open sourced Kafka REST API and an Admin UI that will help users in creating topics, re-assign partitions, Issuing Kafka ACLs and monitoring Consumer offsets.

Apache Kafka Best Practices

DataWorks Summit/Hadoop Summit

Apache kafka

NexThoughts Technologies

An Introduction to Apache Kafka

Amir Sedighi

The session discusses on how companies are using Apache Kafka & also covers under the hood details like partitions, brokers, replication. About apache kafka: Apache Kafka is a distributed a streaming platform, Apache Kafka provides low-latency, high-throughput, fault-tolerant publish and subscribe pipelines and is able to process streams of events. Kafka provides reliable, millisecond responses to support both customer-facing applications and connecting downstream systems with real-time data.

Apache Kafka - Overview

CodeOps Technologies LLP

Watch this talk here: https://www.confluent.io/online-talks/apache-kafka-architecture-and-fundamentals-explained-on-demand This session explains Apache Kafka’s internal design and architecture. Companies like LinkedIn are now sending more than 1 trillion messages per day to Apache Kafka. Learn about the underlying design in Kafka that leads to such high throughput. This talk provides a comprehensive overview of Kafka architecture and internal functions, including: -Topics, partitions and segments -The commit log and streams -Brokers and broker replication -Producer basics -Consumers, consumer groups and offsets This session is part 2 of 4 in our Fundamentals for Apache Kafka series.

Apache Kafka Architecture & Fundamentals Explained

confluent

Apache Kafka

emreakis

Apache kafka

Viswanath J

Introduction to Apache Kafka

AIMDek Technologies

Milano Apache Kafka Meetup by Confluent (First Italian Kafka Meetup) on Wednesday, November 29th 2017. Il talk introduce Apache Kafka (incluse le APIs Kafka Connect e Kafka Streams), Confluent (la società creata dai creatori di Kafka) e spiega perché Kafka è un'ottima e semplice soluzione per la gestione di stream di dati nel contesto di due delle principali forze trainanti e trend industriali: Internet of Things (IoT) e Microservices.

Introduction to Apache Kafka and Confluent... and why they matter

confluent

Running Apache Kafka in production is only the first step in the Kafka operations journey. Professional Kafka users are ready to handle all possible disasters - because for most businesses having a disaster recovery plan is not optional. In this session, we’ll discuss disaster scenarios that can take down entire Kafka clusters and share advice on how to plan, prepare and handle these events. This is a technical session full of best practices - we want to make sure you are ready to handle the worst mayhem that nature and auditors can cause. Visit www.confluent.io for more information.

Disaster Recovery Plans for Apache Kafka

confluent

Exactly-once Stream Processing with Kafka Streams

Guozhang Wang

Introduction to Kafka streaming platform. Covers Kafka Architecture with some small examples from the command line. Then we expand on this with a multi-server example. Lastly, we added some simple Java client examples for a Kafka Producer and a Kafka Consumer. We have started to expand on the Java examples to correlate with the design discussion of Kafka. We have also expanded on the Kafka design section and added references.

Kafka Tutorial - basics of the Kafka streaming platform

Jean-Paul Azar

Apache Kafka at LinkedIn

Guozhang Wang

Producer Performance Tuning for Apache Kafka

Jiangjie Qin

Fundamentals of Apache Kafka

Chhavi Parasher

Mais procurados (20)

Apache kafka

Apache Kafka Fundamentals for Architects, Admins and Developers

Apache Kafka - Martin Podval

Kafka 101

Introduction to Kafka Streams

Apache Kafka Best Practices

Apache kafka

An Introduction to Apache Kafka

Apache Kafka - Overview

Apache Kafka Architecture & Fundamentals Explained

Apache Kafka

Apache kafka

Introduction to Apache Kafka

Introduction to Apache Kafka and Confluent... and why they matter

Disaster Recovery Plans for Apache Kafka

Exactly-once Stream Processing with Kafka Streams

Kafka Tutorial - basics of the Kafka streaming platform

Apache Kafka at LinkedIn

Producer Performance Tuning for Apache Kafka

Fundamentals of Apache Kafka

Semelhante a Apache Kafka at LinkedIn

I Heart Log: Real-time Data and Apache Kafka

Jay Kreps

Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...

Erik Onnen

Kafka is a high-throughput, fault-tolerant, scalable platform for building high-volume near-real-time data pipelines. This presentation is about tuning Kafka pipelines for high-performance. Select configuration parameters and deployment topologies essential to achieve higher throughput and low latency across the pipeline are discussed. Lessons learned in troubleshooting and optimizing a truly global data pipeline that replicates 100GB data under 25 minutes is discussed.

Tuning kafka pipelines

Sumant Tambe

Independent of the source of data, the integration of event streams into an Enterprise Architecture gets more and more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analysed, often with many consumers or systems interested in all or part of the events. How can me make sure that all these event are accepted and forwarded in an efficient and reliable way? This is where Apache Kafaka comes into play, a distirbuted, highly-scalable messaging broker, build for exchanging huge amount of messages between a source and a target. This session will start with an introduction into Apache and presents the role of Apache Kafka in a modern data / information architecture and the advantages it brings to the table. Additionally the Kafka ecosystem will be covered as well as the integration of Kafka in the Oracle Stack, with products such as Golden Gate, Service Bus and Oracle Stream Analytics all being able to act as a Kafka consumer or producer.

Apache Kafka - Scalable Message-Processing and more !

Guido Schmutz

Keystone - Processing over Half a Trillion events per day with 8 million events & 17 GB per second peaks, and at-least once processing semantics. We will explore in detail how we employ Kafka, Samza, and Docker at scale to implement a multi-tenant pipeline. We will also look at the evolution to its current state and where the pipeline is headed next in offering a self-service stream processing infrastructure atop the Kafka based pipeline and support Spark Streaming.

Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015

Monal Daxini

Kafka Evaluation - High Throughout Message Queue

Shafaq Abdullah

F_1330_Narkhede_Kafka .pptx

NIMITJAIN71

Fundamentals and Architecture of Apache Kafka

Angelo Cesaro

Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...

Chris Fregly

High performace network of Cloud Native Taiwan User Group

HungWei Chiu

Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming

Dibyendu Bhattacharya

Lessons Learned: Using Spark and Microservices

Alexis Seigneurin

Kafka Streams is a client library providing organizations with a particularly efficient framework for processing streaming data. It offers a streamlined method for creating applications and microservices that must process data in real-time to be effective. Using the Streams API within Apache Kafka, the solution fundamentally transforms input Kafka topics into output Kafka topics. The benefits are important: Kafka Streams pairs the ease of utilizing standard Java and Scala application code on the client end with the strength of Kafka’s robust server-side cluster architecture.

Introduction to Kafka Streams Presentation

Knoldus Inc.

At Hootsuite, we've been transitioning from a single monolithic PHP application to a set of scalable Scala-based microservices. To avoid excessive coupling between services, we've implemented an event system using Apache Kafka that allows events to be reliably produced + consumed asynchronously from services as well as data stores. In this presentation, I talk about: - Why we chose Kafka - How we set up our Kafka clusters to be scalable, highly available, and multi-data-center aware. - How we produce + consume events - How we ensure that events can be understood by all parts of our system (Some that are implemented in other programming languages like PHP and Python) and how we handle evolving event payload data.

Building an Event Bus at Scale

jimriecken

This presentation focuses on how to integrate all these components into an enterprise environment and what things you need to consider as you move into production. We will touch on the following topics: - Patterns for integrating with existing data systems and applications - Metadata management at enterprise scale - Tradeoffs in performance, cost, availability and fault tolerance - Choosing which cross-datacenter replication patterns fit with your application - Considerations for operating Kafka-based data pipelines in production

Streaming in Practice - Putting Apache Kafka in Production

confluent

Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Example

confluent

Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...

Trivadis

Timothy Spann: Apache Pulsar for ML

Edunomica

Why is Kafka so fast? Why is Kafka so popular? Why Kafka? Introduction to Kafka streaming platform. Covers Kafka Architecture with some small examples from the command line. Then we expand on this with a multi-server example. Lastly, we added some simple Java client examples for a Kafka Producer and a Kafka Consumer. We have started to expand on the Java examples to correlate with the design discussion of Kafka. We have also expanded on the Kafka design section and added references.

Kafka Tutorial - introduction to the Kafka streaming platform

Jean-Paul Azar

Real time data pipline with kafka streams

Yoni Farin

Semelhante a Apache Kafka at LinkedIn (20)

I Heart Log: Real-time Data and Apache Kafka

Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...

Tuning kafka pipelines

Apache Kafka - Scalable Message-Processing and more !

Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015

Kafka Evaluation - High Throughout Message Queue

F_1330_Narkhede_Kafka .pptx

Fundamentals and Architecture of Apache Kafka

Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...

High performace network of Cloud Native Taiwan User Group

Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming

Lessons Learned: Using Spark and Microservices

Introduction to Kafka Streams Presentation

Building an Event Bus at Scale

Streaming in Practice - Putting Apache Kafka in Production

Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Example

Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...

Timothy Spann: Apache Pulsar for ML

Kafka Tutorial - introduction to the Kafka streaming platform

Real time data pipline with kafka streams

Último

S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx

SCMS School of Architecture

A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx

maisarahman1

From customer value engagements to hands-on production support, our Services span across every stage of our customers digital transformation journey, to help ensure that every customer is successful in their adoption of our solutions. • Implementation, Upgrade, Migration, and Maintenance Services • On-Premises and On-Cloud • COTS Training Services; On-Site and Virtual • Software Support Services; Legacy and 3DEXPERIENCE • Value Engagement & Blueprinting • Specialized Consulting and Support Services • Customized Training Services • Automation and Configuration Services • Technical Resource Augmentation Services • Project Management • Know-how Training (mentoring) and Resource Augmentation

Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...

Arindam Chakraborty, Ph.D., P.E. (CA, TX)

LECTURE 01 Introduction to Computers Computers in Society Components of a Computer Types of Computers Definition: “A computer is an electronic device that manipulates information, or data. It has the ability to store, retrieve, and process data” The word "computer" is derived from the Latin word "computare," which means "to calculate" Computers are used for Businesses Communication Entertainment Education Medical Field

Computer Lecture 01.pptxIntroduction to Computers

MairaAshraf6

Engineering Drawing focus on projection of planes

RAJNEESHKUMAR341697

The project Hospital Management system includes registration of patients, storing their details into the system, and also computerized billing in the pharmacy, and labs. The software has the facility to give a unique id for every patient and stores the details of every patient and the staff automatically. It includes a search facility to know the current status of each room. User can search availability of a doctor and the details of a patient using the id. The Hospital Management System can be entered using a username and password. It is accessible either by an administrator or receptionist. Only they can add data into the database. The data can be retrieved easily. The interface is very user-friendly. The data are well protected for personal use and makes the data processing very fast. Hospital Management System is powerful, flexible, and easy to use and is designed and developed to deliver real conceivable benefits to hospitals. Hospital Management System is designed for multispecialty hospitals, to cover a wide range of hospital administration and management processes. It is an integrated end-to-end Hospital Management System that provides relevant information across the hospital to support effective decision making for patient care, hospital administration and critical financial accounting, in a seamless flow. Hospital Management System is a software product suite designed to improve the quality and management of hospital management in the areas of clinical process analysis and activity-based costing. Hospital Management System enables you to develop your organization and improve its effectiveness and quality of work. Managing the key processes efficiently is critical to the success of the hospital helps you manage your processes.

Hospital management system project report.pdf

Kamal Acharya

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

ssuser89054b

NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...

Amil baba

Motoring and generation Armature circuit equation for motoring and generation, Types of field excitations - separately excited, shunt and series. Open circuit characteristic of separately excited DC generator, back EMF with armature reaction, voltage build-up in a shunt generator, critical field resistance and critical speed. V-I characteristics and torque-speed characteristics of separately excited shunt and series motors. Speed control through armature voltage. Losses, load testing and back-to-back testing of DC machines

DC MACHINE-Motoring and generation, Armature circuit equation

BhangaleSonal

Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRLS IN bhubaneswar ESCORT SERVICE❤CALL GIRL No Advance Pay 💋💋 Book Now +91-9777949614 Discover Bhubaneswar Finest Call GirlsY ou're heading to Bhubaneswar for business or pleasure and want to experience the city to the fullest. What better way than in the company of a beautiful, intelligent companion to show you around town? Look no further than Bhubaneswar's finest Call Girls. These lovely ladies know all the hot spots and hidden gems that only locals are privy to. From the top restaurants and bars to the most stunning sights, they'll ensure you make unforgettable memories.

Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...

Call Girls Mumbai

Built environment is known for its capacity, capability, role, relevance and importance to change the quality of life of the occupants and communities. Presentation focuses on options which need to be leveraged to make buildings sustainable, cost-effective, energy efficient, resource efficient, qualitative over its entire life-cycle through designing, construction, operation. It calls for making buildings green and sustainable.

COST-EFFETIVE and Energy Efficient BUILDINGS ptx

JIT KUMAR GUPTA

Rule 1 − Check for the blocks connected in series and simplify. Rule 2 − Check for the blocks connected in parallel and simplify. Rule 3 − Check for the blocks connected in feedback loop and simplify. Rule 4 − If there is difficulty with take-off point while simplifying, shift it towards right. Rule 5 − If there is difficulty with summing point while simplifying, shift it towards left. Rule 6 − Repeat the above steps till you get the simplified form, i.e., single block.

Block diagram reduction techniques in control systems.ppt

NANDHAKUMARA10

GAS POWER CYCLES Cycles: Otto, Diesel, Dual, Brayton - Calculation of mean effective pressure - Air standard efficiency - Comparison of cycles INTERNAL COMBUSTION ENGINES Classification - Components and their function - valve timing diagram and port timing diagram - actual and theoretical p-v diagram of two stroke and four stroke engines – carburettor - diesel pump and injector system - battery and magneto ignition system - principles of combustion and detonation in CI engines - lubrication and cooling systems - performance parameters and calculations.

Thermal Engineering Unit - I & II . ppt

DineshKumar4165

Moment Distribution Method For Btech Civil

VinayVitekari

kiln thermal load.pptx kiln tgermal load

hamedmustafa094

1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf

AldoGarca30

STEAM NOZZLES AND TURBINES Flow of steam through nozzles, shapes of nozzles, effect of friction, critical pressure ratio, supersaturated flow - impulse and reaction principles, velocity diagram, work done and efficiency – types of compounding - governors. AIR COMPRESSORS Classification - working principle - type of compressors, work of compression with and without clearance - volumetric efficiency - isothermal and isentropic efficiency of reciprocating compressors - multistage air compressor with inter cooling.

Thermal Engineering -unit - III & IV.ppt

DineshKumar4165

FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads

Arindam Chakraborty, Ph.D., P.E. (CA, TX)

Design For Accessibility: Getting it right from the start

Quintin Balsdon

Employee leave management system project.

Kamal Acharya

Apache Kafka at LinkedIn

1. Jay Kreps Introduction to Apache Kafka

2. The Plan 1. What is Apache Kafka? 2. Kafka and Data Integration 3. Kafka and Stream Processing

3. Apache Kafka

4. A brief history of Apache Kafka

5. Characteristics • Scalability of a filesystem – Hundreds of MB/sec/server throughput – Many TB per server • Guarantees of a database – Messages strictly ordered – All data persistent • Distributed by default – Replication – Partitioning model

6. Kafka is about logs

7. What is a log?

10. Logs: pub/sub done right

11. Partitioning

12. Nodes Host Many Partitions

13. Producers Balance Load

14. Consumer’s Divide Up Partitions

15. End-to-End

16. Kafka At LinkedIn • 175 TB of in-flight log data per colo • Replicated to each datacenter • Tens of thousands of data producers • Thousands of consumers • 7 million messages written/sec • 35 million messages read/sec • Hadoop integration

17. Performance • Producer (3x replication): – Async: 786,980 records/sec (75.1 MB/sec) – Sync: 421,823 records/sec (40.2 MB/sec) • Consumer: – 940,521 records/sec (89.7 MB/sec) • End-to-end latency: – 2 ms (median) – 14 ms (99.9th percentile)

18.

19. The Plan 1. What is Apache Kafka? 2. Kafka and Data Integration 3. Kafka and Stream Processing

20. Data Integration

21. Maslow’s Hierarchy

22. For Data

23. New Types of Data • Database data – Users, products, orders, etc • Events – Clicks, Impressions, Pageviews, etc • Application metrics – CPU usage, requests/sec • Application logs – Service calls, errors

24. New Types of Systems • Live Stores – Voldemort – Espresso – Graph – OLAP – Search – InGraphs • Offline – Hadoop – Teradata

25. Bad

26. Good

27. Example: User views job

28. Comparing Data Transfer Mechanisms

29. The Plan 1. What is Apache Kafka? 2. Kafka and Data Integration 3. Kafka and Stream Processing

30. Stream Processing

31. Stream processing is a generalization of batch processing

32. Stream Processing = Logs + Jobs

33. Examples • Monitoring • Security • Content processing • Recommendations • Newsfeed • ETL

34. Frameworks Can Help

35. Samza Architecture

36. Log-centric Architecture

37. Kafka http://kafka.apache.org Samza http://samza.incubator.apache.org Log Blog http://linkd.in/199iMwY Benchmark: http://t.co/40fkKJvanx Me http://www.linkedin.com/in/jaykreps @jaykreps

Notas do Editor

Who are you? What is this talk about? Exciting topic More
Messaging system, like JMS (but different!) Producers, consumers distributed
Start with state at LinkedIn, describe each pipeline 1 Pipeline for database data 1 Pipeline for metrics 1 Pipeline for events 1 JMS-based pipeline No pipeline for application logs 300 ActiveMQ brokers
10,000 messages/sec * 100 byte messages = ~1MB/sec
The log is fundamental abstraction Kafka provides You can use a log as a drop-in replacement for a messaging system, but it can also do a lot more
What is a log? Traditional uses? Non-traditional uses…
Time ordered Semi-structured
Data structure not a text file List of changes Contents of record doesn’t matter Indexed by “time” Not application log (i.e. text file)
Remotely accessible State machine replication
Data model of Kafka: A topic Partitions can be spread over machines, replicated
Path of a write Leadership failover Guarantees
AKA ETL Many systems Event data Most important problem for data-centric companies Integration >> ML
Maslow’s Hiearchy Abraham Maslow, Physchologist, 1943 Physiological – eat, drink, sleep Safety – Not being attacked Love/Belonging – friends and family Esteem – respect of others Self-Actualization – morality, creativity, spontenaity
Want to do Deep Learning Instead finding that their CSV data ALSO has commas in it Copying files around Ugh The Caveman Data Warehousing has a bad reputation
Two exacerbating factors 15 years ago, just the first one (transactional data) New categories are very high volume, maybe 100x the transactional data Look like events Internet of things
One-size fits all
Tell story: Started with Hadoop, added arrows to get data there Want to build fancy algorithms, need data (expectation 90% of time for fancy, 10% for data) Holy shit this is hard! Data is missing, data is late, computation runs on wrong data Hadoop without good data is just a very expensive space heater Never get to full connectivity
Metcalfe’s law Each new system connects to get/give data All data in multi-subscriber, real-time logs The company is a big distributed system The data center is the distributed system
Three dims: Throughput Guarantees Latency Advantages over messaging: Huge data backlog Order Advantages over files Real-time Advantage over both: principled notion of time
Whole organization is big distributed system Commit log = data transfer Stream processing = triggers Batch is dominant paradigm for data processing, why?
Service: One input = one output Batch job: All inputs = all outputs Stream computing: any window = output for that window
No different from batch processing flow (instead of files/tables, logs)
Storm and Samza About process management – both integrate with Kafka MapReduce and HDFS

Apache Kafka at LinkedIn

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Apache Kafka at LinkedIn

Semelhante a Apache Kafka at LinkedIn (20)

Último

Último (20)

Apache Kafka at LinkedIn

Notas do Editor