SlideShare uma empresa Scribd logo
1 de 30
Baixar para ler offline
Multitenancy: Kafka clusters
for everyone at LINE
Yuto Kawamura - LINE Corporation
Speaker introduction
— Yuto Kawamura
— Senior Software Engineer at
LINE
— Leading project to redesign
microservices architecture
w/ Kafka
— Apache Kafka Contributor
— Speaker at Kafka Summit SF
2017
— Also at Kafka Meetup #3
Outline
— Kafka at LINE as of today (2018.04)
— Challenges on multitenancy
— Engineering for achieving multitenancy
Kafka at LINE as of today
(2018.04)
We have more clusters
— Added more clusters since last
year to support:
— Different DCs
— Security sensitive data w/
SASL+TLS
— They are separated by "purposes"
but not by "users" ; our
multitenancy strategy
— Fewer clusters allow us to
concentrate our engineering
resources for maximizing their
performance
— They're concepturally the "Data
Hub" too
One cluster has many users
— Topics:
— 100 ~ 400+ per cluster
— Users:
— few ~ tens per cluster
— Messages: 150 billion messages / day in largest cluster
— 3~ million / sec on peak
— None of messages are supposed to lost because all
usages are somehow related to service
Challenges on multitenancy
For doing multitenancy, we have to ensure:
— Certain level of isolation among client workloads
— Cluster is abusing-client proof
— Can track on which client sending particular request
— We have to be conïŹdent about what we do to say
"don't worry" for people saying "we want a dedicated
cluster only for us!"
Engineering for achieving
multitenancy
Request Quota
— It's more important to manage number of requests over
incoming/outgoing byte rate
— Kafka is amazingly strong at handling large data if they
are well-batched
— => For consumers responses are naturally batched
— => Main danger exists on Producers which conïŹgures
linger.ms=0
— Starting from 0.11.0.0, by KIP-124 we can conïŹgure request
rate quota 2
2
https://cwiki.apache.org/conïŹ‚uence/display/KAFKA/KIP-124+-+Request+rate+quotas
Request Quota
— Manage master of cluster conïŹg in YAML inside Ansible repository
— Apply all at once during cluster provisioning by kafka_config ansible module
(developed internally)
— Can tell latest conïŹg on cluster w/o quierying cluster, can keep change history
on git
---
kafka_cluster_configs:
- entity_type: clients
configs:
request_percentage: 40
producer_byte_rate: 1073741824
- entity_type: clients
entity_name: foobar-producer
configs:
request_percentage: 200
Slowlog
— Log requests which took longer than certain threshold to process
— Kafka has "request logging" but it leads too many of lines
— Inspired by HBase's
# RequestChannel.scala#updateRequestMetrics
+ slowLogThresholdMap.get(metricNames.head).filter(_ >= 0).filter { v =>
+ val targetTime = requestId match {
+ case ApiKeys.FETCH.id => totalTime - apiRemoteTime
+ case _ => totalTime
+ }
+
+ targetTime >= v
+ }.foreach { _ =>
+ requestLogger.warn("Slow response:%s from connection %s;totalTime:%d...
+ .format(requestDesc(true), connectionId, totalTime, requestQueueTime...
+ }
[2016-12-26 16:04:20,135] WARN Slow response:Name: FetchRequest;
Version: 2 ... ;totalTime:1817;localTime: ...
Slowlog
— Thresholds can be changed dynamically through JMX console for each request type
The disk read by delayed consumer problem
— Detection: 50x ~ 100x slower response time in 99th %ile Produce response time
— Disk read of certain amount
— Network threads' utilization was very high
Suspecting sendïŹle is taking long...
— Because: 1. disk read was occuring at that time, 2. network threads' utilization was high
$ stap -e ‘(script counting sendfile(2) duration histogram)’
value |---------------------------------------- count
0 | 0
1 | 71
2 |@@@ 6171
16 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 29472
32 |@@@ 3418
2048 | 0
...
8192 | 3
— Normal: 2 ~ 32us
— Outliers: 8ms ~
— (About SystemTap, see my previous presentation 3
)
3
https://www.slideshare.net/kawamuray/kafka-meetup-jp-3-engineering-apache-kafka-at-line
Kafka broker's thread model
— Network threads (controlled by
num.network.threads) takes
read/write of requests/
responses
— Network threads hold
established connections
exclusively for event-driven IO
— Request handler threads
(controlled by num.io.threads)
takes request processing and
IO between block device
except sendïŹle(2) for Fetch
requests
When Fetch request for data that doesn't present in
page cache occurs...
Problem deïŹnition
— network threads contains potentially-blocking ops
while it's supposed to work as event loop
— and we have no way to know if upcoming sendfile(2)
blocks awaiting disk read or not
It was the one of the worst issues we had because of:
— Completely breaks resource isolation among all
clients including producers
— Occurs naturally when one of consumers slows down
— Have to communicate with users every time to ask for
ïŹx
— Occurs 100% when one broker restores log data from
leader
Solution candidates
— A: Separate network threads among clients
— => Possible, but a lot of changes required
— => Not essential because network threads should be
completely computation intensive
— B: Balance connections among network threads
— => Possible, but again a lot of changes
— => Still for ïŹrst moment other connections will get
affected
— C: Make sure that data are ready on memory before the
response passed to the network thread
To make sure non-blocking sendïŹle(2) in network
threads...
— The target data must be available on page cache
How?
NAME
sendfile - transfer data between file descriptors
SYNOPSIS
#include <sys/sendfile.h>
ssize_t sendfile(int out_fd, int in_fd, off_t *offset, size_t count);
— sendfile(2) on Linux doesn't accepts ïŹ‚ags for controlling
it's behavior
— Interestingly FreeBSD has such, by contribution from nginx
and NetïŹ‚ix 1
1
https://www.nginx.com/blog/nginx-and-netïŹ‚ix-contribute-new-sendïŹle2-to-freebsd/
So have to;
1. Pre-read data not available on page cache from disks,
2. and conïŹrm the page's existence before passing
response to network threads
sendïŹle(2) to the dest /dev/null
— Calling channel.transferTo("/dev/null") (== sendfile(/
dev/null)) in request handler thread might populates
page cache?
— Tested out, and ïŹgured out there's no noticeable
performance impact
How could it be that harmless?
— Linux kernel internally uses splice to implement sendfile(2)
— splice requests struct file_operations to handle splice
— struct file_operations null_fops just iterates list of page pointers but not each
bytes
— => Iteration count is SIZE / PAGE_SIZE(4k)
# ./drivers/char/mem.c
static int pipe_to_null(struct pipe_inode_info *info, struct pipe_buffer *buf,
struct splice_desc *sd)
{
return sd->len;
}
static ssize_t splice_write_null(struct pipe_inode_info *pipe,struct file *out,
loff_t *ppos, size_t len, unsigned int flags)
{
return splice_from_pipe(pipe, out, ppos, len, flags, pipe_to_null);
}
Patching broker to call sendïŹle(/dev/null) in request
handler threads
# FileRecords.java
@SuppressWarnings("UnnecessaryFullyQualifiedName")
private static final java.nio.file.Path DEVNULL_PATH = new File("/dev/null").toPath();
public void prepareForRead() throws IOException {
long size = Math.min(channel.size(), end) - start;
try (FileChannel devnullChannel = FileChannel.open(DEVNULL_PATH,
java.nio.file.StandardOpenOption.WRITE)) {
channel.transferTo(start, size, devnullChannel);
}
}
— Still not fully-portable because it assumes underlying
kernel's implementation detail (so we haven't
contributed...)
... and more to minimize impact of increased syscall...
# Log.scala#read
@@ -585,6 +586,17 @@ class Log(@volatile var dir: File,
if(fetchInfo == null) {
entry = segments.higherEntry(entry.getKey)
} else {
+ // For last entries we assume that it is hot enough to still have all data in page cache.
+ // Most of fetch requests are fetching from the tail of the log, so this optimization
+ // should save call of readahead() + mmap() + mincore() * N significantly.
+ if (!isLastEntry && fetchInfo.records.isInstanceOf[FileRecords]) {
+ try {
+ info("Prepare Read for " + fetchInfo.records.asInstanceOf[FileRecords].file().getPath)
+ fetchInfo.records.asInstanceOf[FileRecords].prepareForRead()
+ } catch {
+ case e: Throwable => warn("failed to prepare cache for read", e)
+ }
+ }
return fetchInfo
}
— Perform cache warmup only if the read segment IS NOT the latest
— => can save unnecessary syscalls for 99% of Fetch requests
A!er all
Conclusion
— Having fewer clusters enables us to concentriate on
reliability engineering and essential troubleshootings/ïŹxes
— Preventive engineering enables us to keep operating
Kafka clusters in highest reliability even under high and
inexplicable load
— We've had some failures in development cluster, but
never in production cluster
— The important in operating on-premise multitenancy; not
necessary to prevent 100% of failure, but never let the
same hole to be punched again
End of presentation.
Questions?

Mais conteĂșdo relacionado

Mais procurados

Introduction Apache Kafka
Introduction Apache KafkaIntroduction Apache Kafka
Introduction Apache KafkaJoe Stein
 
Kafka blr-meetup-presentation - Kafka internals
Kafka blr-meetup-presentation - Kafka internalsKafka blr-meetup-presentation - Kafka internals
Kafka blr-meetup-presentation - Kafka internalsAyyappadas Ravindran (Appu)
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafkaemreakis
 
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache AccumuloReal-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache AccumuloJoe Stein
 
Flume and Hadoop performance insights
Flume and Hadoop performance insightsFlume and Hadoop performance insights
Flume and Hadoop performance insightsOmid Vahdaty
 
Current and Future of Apache Kafka
Current and Future of Apache KafkaCurrent and Future of Apache Kafka
Current and Future of Apache KafkaJoe Stein
 
Have your cake and eat it too
Have your cake and eat it tooHave your cake and eat it too
Have your cake and eat it tooGwen (Chen) Shapira
 
Intro to Spark - for Denver Big Data Meetup
Intro to Spark - for Denver Big Data MeetupIntro to Spark - for Denver Big Data Meetup
Intro to Spark - for Denver Big Data MeetupGwen (Chen) Shapira
 
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013Christopher Curtin
 
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBayReal-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBayAltinity Ltd
 
Cloudera's Flume
Cloudera's FlumeCloudera's Flume
Cloudera's FlumeCloudera, Inc.
 
Deploying Apache Flume to enable low-latency analytics
Deploying Apache Flume to enable low-latency analyticsDeploying Apache Flume to enable low-latency analytics
Deploying Apache Flume to enable low-latency analyticsDataWorks Summit
 
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision MakingData Architectures for Robust Decision Making
Data Architectures for Robust Decision MakingGwen (Chen) Shapira
 
Real-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache KafkaReal-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache KafkaJoe Stein
 
Kafka Needs no Keeper( Jason Gustafson & Colin McCabe, Confluent) Kafka Summi...
Kafka Needs no Keeper( Jason Gustafson & Colin McCabe, Confluent) Kafka Summi...Kafka Needs no Keeper( Jason Gustafson & Colin McCabe, Confluent) Kafka Summi...
Kafka Needs no Keeper( Jason Gustafson & Colin McCabe, Confluent) Kafka Summi...confluent
 
Large scale near real-time log indexing with Flume and SolrCloud
Large scale near real-time log indexing with Flume and SolrCloudLarge scale near real-time log indexing with Flume and SolrCloud
Large scale near real-time log indexing with Flume and SolrCloudDataWorks Summit
 
Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...
Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...
Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...StreamNative
 
Large scale log pipeline using Apache Pulsar_Nozomi
Large scale log pipeline using Apache Pulsar_NozomiLarge scale log pipeline using Apache Pulsar_Nozomi
Large scale log pipeline using Apache Pulsar_NozomiStreamNative
 

Mais procurados (20)

Kafka Connect
Kafka ConnectKafka Connect
Kafka Connect
 
Introduction Apache Kafka
Introduction Apache KafkaIntroduction Apache Kafka
Introduction Apache Kafka
 
Kafka blr-meetup-presentation - Kafka internals
Kafka blr-meetup-presentation - Kafka internalsKafka blr-meetup-presentation - Kafka internals
Kafka blr-meetup-presentation - Kafka internals
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Kafka basics
Kafka basicsKafka basics
Kafka basics
 
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache AccumuloReal-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
 
Flume and Hadoop performance insights
Flume and Hadoop performance insightsFlume and Hadoop performance insights
Flume and Hadoop performance insights
 
Current and Future of Apache Kafka
Current and Future of Apache KafkaCurrent and Future of Apache Kafka
Current and Future of Apache Kafka
 
Have your cake and eat it too
Have your cake and eat it tooHave your cake and eat it too
Have your cake and eat it too
 
Intro to Spark - for Denver Big Data Meetup
Intro to Spark - for Denver Big Data MeetupIntro to Spark - for Denver Big Data Meetup
Intro to Spark - for Denver Big Data Meetup
 
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
 
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBayReal-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay
 
Cloudera's Flume
Cloudera's FlumeCloudera's Flume
Cloudera's Flume
 
Deploying Apache Flume to enable low-latency analytics
Deploying Apache Flume to enable low-latency analyticsDeploying Apache Flume to enable low-latency analytics
Deploying Apache Flume to enable low-latency analytics
 
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision MakingData Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
 
Real-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache KafkaReal-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache Kafka
 
Kafka Needs no Keeper( Jason Gustafson & Colin McCabe, Confluent) Kafka Summi...
Kafka Needs no Keeper( Jason Gustafson & Colin McCabe, Confluent) Kafka Summi...Kafka Needs no Keeper( Jason Gustafson & Colin McCabe, Confluent) Kafka Summi...
Kafka Needs no Keeper( Jason Gustafson & Colin McCabe, Confluent) Kafka Summi...
 
Large scale near real-time log indexing with Flume and SolrCloud
Large scale near real-time log indexing with Flume and SolrCloudLarge scale near real-time log indexing with Flume and SolrCloud
Large scale near real-time log indexing with Flume and SolrCloud
 
Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...
Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...
Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...
 
Large scale log pipeline using Apache Pulsar_Nozomi
Large scale log pipeline using Apache Pulsar_NozomiLarge scale log pipeline using Apache Pulsar_Nozomi
Large scale log pipeline using Apache Pulsar_Nozomi
 

Semelhante a Multitenancy: Kafka clusters for everyone at LINE

Kafka Multi-Tenancy—160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy—160 Billion Daily Messages on One Shared Cluster at LINE Kafka Multi-Tenancy—160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy—160 Billion Daily Messages on One Shared Cluster at LINE confluent
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...DataWorks Summit/Hadoop Summit
 
Developing Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaDeveloping Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaJoe Stein
 
Streaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in ProductionStreaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in Productionconfluent
 
Multi-Tenancy Kafka cluster for LINE services with 250 billion daily messages
Multi-Tenancy Kafka cluster for LINE services with 250 billion daily messagesMulti-Tenancy Kafka cluster for LINE services with 250 billion daily messages
Multi-Tenancy Kafka cluster for LINE services with 250 billion daily messagesLINE Corporation
 
Project Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Project Tungsten Phase II: Joining a Billion Rows per Second on a LaptopProject Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Project Tungsten Phase II: Joining a Billion Rows per Second on a LaptopDatabricks
 
Kafka zero to hero
Kafka zero to heroKafka zero to hero
Kafka zero to heroAvi Levi
 
Apache Kafka - From zero to hero
Apache Kafka - From zero to heroApache Kafka - From zero to hero
Apache Kafka - From zero to heroApache Kafka TLV
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Anton Nazaruk
 
Streaming kafka search utility for Mozilla's Bagheera
Streaming kafka search utility for Mozilla's BagheeraStreaming kafka search utility for Mozilla's Bagheera
Streaming kafka search utility for Mozilla's BagheeraVarunkumar Manohar
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkRahul Jain
 
Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra
Movile Internet Movel SA: A Change of Seasons: A big move to Apache CassandraMovile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra
Movile Internet Movel SA: A Change of Seasons: A big move to Apache CassandraDataStax Academy
 
Cassandra Summit 2015 - A Change of Seasons
Cassandra Summit 2015 - A Change of SeasonsCassandra Summit 2015 - A Change of Seasons
Cassandra Summit 2015 - A Change of SeasonsEiti Kimura
 
Oracle 11g R2 RAC setup on rhel 5.0
Oracle 11g R2 RAC setup on rhel 5.0Oracle 11g R2 RAC setup on rhel 5.0
Oracle 11g R2 RAC setup on rhel 5.0Santosh Kangane
 
Developing Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaDeveloping Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaJoe Stein
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Guido Schmutz
 
Stream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NETStream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NETconfluent
 
Postgres clusters
Postgres clustersPostgres clusters
Postgres clustersStas Kelvich
 

Semelhante a Multitenancy: Kafka clusters for everyone at LINE (20)

Kafka Multi-Tenancy—160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy—160 Billion Daily Messages on One Shared Cluster at LINE Kafka Multi-Tenancy—160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy—160 Billion Daily Messages on One Shared Cluster at LINE
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
 
Developing Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaDeveloping Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache Kafka
 
Streaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in ProductionStreaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in Production
 
Multi-Tenancy Kafka cluster for LINE services with 250 billion daily messages
Multi-Tenancy Kafka cluster for LINE services with 250 billion daily messagesMulti-Tenancy Kafka cluster for LINE services with 250 billion daily messages
Multi-Tenancy Kafka cluster for LINE services with 250 billion daily messages
 
Project Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Project Tungsten Phase II: Joining a Billion Rows per Second on a LaptopProject Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Project Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
 
Kafka zero to hero
Kafka zero to heroKafka zero to hero
Kafka zero to hero
 
Apache Kafka - From zero to hero
Apache Kafka - From zero to heroApache Kafka - From zero to hero
Apache Kafka - From zero to hero
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?
 
Streaming kafka search utility for Mozilla's Bagheera
Streaming kafka search utility for Mozilla's BagheeraStreaming kafka search utility for Mozilla's Bagheera
Streaming kafka search utility for Mozilla's Bagheera
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache Spark
 
Chapter 6 os
Chapter 6 osChapter 6 os
Chapter 6 os
 
Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra
Movile Internet Movel SA: A Change of Seasons: A big move to Apache CassandraMovile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra
Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra
 
Cassandra Summit 2015 - A Change of Seasons
Cassandra Summit 2015 - A Change of SeasonsCassandra Summit 2015 - A Change of Seasons
Cassandra Summit 2015 - A Change of Seasons
 
Oracle 11g R2 RAC setup on rhel 5.0
Oracle 11g R2 RAC setup on rhel 5.0Oracle 11g R2 RAC setup on rhel 5.0
Oracle 11g R2 RAC setup on rhel 5.0
 
Developing Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaDeveloping Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache Kafka
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
Cl116
Cl116Cl116
Cl116
 
Stream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NETStream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NET
 
Postgres clusters
Postgres clustersPostgres clusters
Postgres clusters
 

Último

Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Mcleodganj Call Girls đŸ„° 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls đŸ„° 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls đŸ„° 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls đŸ„° 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 

Último (20)

Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Mcleodganj Call Girls đŸ„° 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls đŸ„° 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls đŸ„° 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls đŸ„° 8617370543 Service Offer VIP Hot Model
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 

Multitenancy: Kafka clusters for everyone at LINE

  • 1. Multitenancy: Kafka clusters for everyone at LINE Yuto Kawamura - LINE Corporation
  • 2. Speaker introduction — Yuto Kawamura — Senior Software Engineer at LINE — Leading project to redesign microservices architecture w/ Kafka — Apache Kafka Contributor — Speaker at Kafka Summit SF 2017 — Also at Kafka Meetup #3
  • 3. Outline — Kafka at LINE as of today (2018.04) — Challenges on multitenancy — Engineering for achieving multitenancy
  • 4. Kafka at LINE as of today (2018.04)
  • 5. We have more clusters — Added more clusters since last year to support: — Different DCs — Security sensitive data w/ SASL+TLS — They are separated by "purposes" but not by "users" ; our multitenancy strategy — Fewer clusters allow us to concentrate our engineering resources for maximizing their performance — They're concepturally the "Data Hub" too
  • 6. One cluster has many users — Topics: — 100 ~ 400+ per cluster — Users: — few ~ tens per cluster — Messages: 150 billion messages / day in largest cluster — 3~ million / sec on peak — None of messages are supposed to lost because all usages are somehow related to service
  • 8. For doing multitenancy, we have to ensure: — Certain level of isolation among client workloads — Cluster is abusing-client proof — Can track on which client sending particular request — We have to be conïŹdent about what we do to say "don't worry" for people saying "we want a dedicated cluster only for us!"
  • 10. Request Quota — It's more important to manage number of requests over incoming/outgoing byte rate — Kafka is amazingly strong at handling large data if they are well-batched — => For consumers responses are naturally batched — => Main danger exists on Producers which conïŹgures linger.ms=0 — Starting from 0.11.0.0, by KIP-124 we can conïŹgure request rate quota 2 2 https://cwiki.apache.org/conïŹ‚uence/display/KAFKA/KIP-124+-+Request+rate+quotas
  • 11. Request Quota — Manage master of cluster conïŹg in YAML inside Ansible repository — Apply all at once during cluster provisioning by kafka_config ansible module (developed internally) — Can tell latest conïŹg on cluster w/o quierying cluster, can keep change history on git --- kafka_cluster_configs: - entity_type: clients configs: request_percentage: 40 producer_byte_rate: 1073741824 - entity_type: clients entity_name: foobar-producer configs: request_percentage: 200
  • 12. Slowlog — Log requests which took longer than certain threshold to process — Kafka has "request logging" but it leads too many of lines — Inspired by HBase's # RequestChannel.scala#updateRequestMetrics + slowLogThresholdMap.get(metricNames.head).filter(_ >= 0).filter { v => + val targetTime = requestId match { + case ApiKeys.FETCH.id => totalTime - apiRemoteTime + case _ => totalTime + } + + targetTime >= v + }.foreach { _ => + requestLogger.warn("Slow response:%s from connection %s;totalTime:%d... + .format(requestDesc(true), connectionId, totalTime, requestQueueTime... + } [2016-12-26 16:04:20,135] WARN Slow response:Name: FetchRequest; Version: 2 ... ;totalTime:1817;localTime: ...
  • 13. Slowlog — Thresholds can be changed dynamically through JMX console for each request type
  • 14. The disk read by delayed consumer problem — Detection: 50x ~ 100x slower response time in 99th %ile Produce response time — Disk read of certain amount — Network threads' utilization was very high
  • 15. Suspecting sendïŹle is taking long... — Because: 1. disk read was occuring at that time, 2. network threads' utilization was high $ stap -e ‘(script counting sendfile(2) duration histogram)’ value |---------------------------------------- count 0 | 0 1 | 71 2 |@@@ 6171 16 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 29472 32 |@@@ 3418 2048 | 0 ... 8192 | 3 — Normal: 2 ~ 32us — Outliers: 8ms ~ — (About SystemTap, see my previous presentation 3 ) 3 https://www.slideshare.net/kawamuray/kafka-meetup-jp-3-engineering-apache-kafka-at-line
  • 16. Kafka broker's thread model — Network threads (controlled by num.network.threads) takes read/write of requests/ responses — Network threads hold established connections exclusively for event-driven IO — Request handler threads (controlled by num.io.threads) takes request processing and IO between block device except sendïŹle(2) for Fetch requests
  • 17. When Fetch request for data that doesn't present in page cache occurs...
  • 18. Problem deïŹnition — network threads contains potentially-blocking ops while it's supposed to work as event loop — and we have no way to know if upcoming sendfile(2) blocks awaiting disk read or not
  • 19. It was the one of the worst issues we had because of: — Completely breaks resource isolation among all clients including producers — Occurs naturally when one of consumers slows down — Have to communicate with users every time to ask for ïŹx — Occurs 100% when one broker restores log data from leader
  • 20. Solution candidates — A: Separate network threads among clients — => Possible, but a lot of changes required — => Not essential because network threads should be completely computation intensive — B: Balance connections among network threads — => Possible, but again a lot of changes — => Still for ïŹrst moment other connections will get affected — C: Make sure that data are ready on memory before the response passed to the network thread
  • 21. To make sure non-blocking sendïŹle(2) in network threads... — The target data must be available on page cache
  • 22. How? NAME sendfile - transfer data between file descriptors SYNOPSIS #include <sys/sendfile.h> ssize_t sendfile(int out_fd, int in_fd, off_t *offset, size_t count); — sendfile(2) on Linux doesn't accepts ïŹ‚ags for controlling it's behavior — Interestingly FreeBSD has such, by contribution from nginx and NetïŹ‚ix 1 1 https://www.nginx.com/blog/nginx-and-netïŹ‚ix-contribute-new-sendïŹle2-to-freebsd/
  • 23. So have to; 1. Pre-read data not available on page cache from disks, 2. and conïŹrm the page's existence before passing response to network threads
  • 24. sendïŹle(2) to the dest /dev/null — Calling channel.transferTo("/dev/null") (== sendfile(/ dev/null)) in request handler thread might populates page cache? — Tested out, and ïŹgured out there's no noticeable performance impact
  • 25. How could it be that harmless? — Linux kernel internally uses splice to implement sendfile(2) — splice requests struct file_operations to handle splice — struct file_operations null_fops just iterates list of page pointers but not each bytes — => Iteration count is SIZE / PAGE_SIZE(4k) # ./drivers/char/mem.c static int pipe_to_null(struct pipe_inode_info *info, struct pipe_buffer *buf, struct splice_desc *sd) { return sd->len; } static ssize_t splice_write_null(struct pipe_inode_info *pipe,struct file *out, loff_t *ppos, size_t len, unsigned int flags) { return splice_from_pipe(pipe, out, ppos, len, flags, pipe_to_null); }
  • 26. Patching broker to call sendïŹle(/dev/null) in request handler threads # FileRecords.java @SuppressWarnings("UnnecessaryFullyQualifiedName") private static final java.nio.file.Path DEVNULL_PATH = new File("/dev/null").toPath(); public void prepareForRead() throws IOException { long size = Math.min(channel.size(), end) - start; try (FileChannel devnullChannel = FileChannel.open(DEVNULL_PATH, java.nio.file.StandardOpenOption.WRITE)) { channel.transferTo(start, size, devnullChannel); } } — Still not fully-portable because it assumes underlying kernel's implementation detail (so we haven't contributed...)
  • 27. ... and more to minimize impact of increased syscall... # Log.scala#read @@ -585,6 +586,17 @@ class Log(@volatile var dir: File, if(fetchInfo == null) { entry = segments.higherEntry(entry.getKey) } else { + // For last entries we assume that it is hot enough to still have all data in page cache. + // Most of fetch requests are fetching from the tail of the log, so this optimization + // should save call of readahead() + mmap() + mincore() * N significantly. + if (!isLastEntry && fetchInfo.records.isInstanceOf[FileRecords]) { + try { + info("Prepare Read for " + fetchInfo.records.asInstanceOf[FileRecords].file().getPath) + fetchInfo.records.asInstanceOf[FileRecords].prepareForRead() + } catch { + case e: Throwable => warn("failed to prepare cache for read", e) + } + } return fetchInfo } — Perform cache warmup only if the read segment IS NOT the latest — => can save unnecessary syscalls for 99% of Fetch requests
  • 29. Conclusion — Having fewer clusters enables us to concentriate on reliability engineering and essential troubleshootings/ïŹxes — Preventive engineering enables us to keep operating Kafka clusters in highest reliability even under high and inexplicable load — We've had some failures in development cluster, but never in production cluster — The important in operating on-premise multitenancy; not necessary to prevent 100% of failure, but never let the same hole to be punched again