深入Kafka(1)- Kafka 高性能设计

问题背景

在使用了一段时间Kafka后,研究了一下他的内部结构和设计,觉得他有很多地方指得我们借鉴,可以把他的优秀设计思想运用到自己的软件设计中,于是整理并记录下来。

预备知识

已经对Kafka有一个基础的了解,例如,如何使用kafka 的consumer 和 producerAPI。
知道Kafka 的基本概念,例如,分区,topic,recover,rebalace。仅仅知道即可,我们会在文章中具体说明这些概念都是用来做什么的。

1. Batch Model

按照官方说明,有两个东西会影响到kafka的性能:

too many small I/O operations, and excessive byte copying

针对小IO,kafka做了batch model 的设计,针对byte copying,kafka做了零拷贝的设计
batch model 带来的好处有很多:

Batching leads to larger network packets, larger sequential disk operations, contiguous memory blocks, and so on, all of which allows Kafka to turn a bursty stream of random message writes into linear writes that flow to the consumers

kafka cpu跑满 kafka cpu高_API


从上图可以看出,如果要处理两个消息,显然batch model 更划算。下面看看这些batch model 在哪使用

1.1 Producer API

写入kafka 时,可通过参数 batch.size 和 linger.ms 来实现batch写入。Kafka Producer API 会在内存中开辟一块缓存区,当达到 batch.size时,就会往kafka里写入一次。当然,producer api 不会一直等到缓存区满了再写。他会等待到 linger.ms 时,不管有没有达到最大size,都会flush一次。
关于batch.size. 可参照官方说明:

When multiple records are sent to the same partition, the producer will batch them
together. This parameter controls the amount of memory in bytes (not messages!)
that will be used for each batch. When the batch is full, all the messages in the batch
will be sent. However, this does not mean that the producer will wait for the batch to
become full. The producer will send half-full batches and even batches with just a single
message in them. Therefore, setting the batch size too large will not cause delays
in sending messages; it will just use more memory for the batches. Setting the batch
size too small will add some overhead because the producer will need to send messages more frequently.

关于linger.ms,参考官方说明:

linger.ms controls the amount of time to wait for additional messages before sending
the current batch. KafkaProducer sends a batch of messages either when the current
batch is full or when the linger.ms limit is reached. By default, the producer will
send messages as soon as there is a sender thread available to send them, even if
there’s just one message in the batch. By setting linger.ms higher than 0, we instruct
the producer to wait a few milliseconds to add additional messages to the batch
before sending it to the brokers. This increases latency but also increases throughput
(because we send more messages at once, there is less overhead per message).

kafka cpu跑满 kafka cpu高_java_02

1.2 Consumer API

consumer API 里面也有batch model的设计,分别在:

  • 从broker上拉取message时
  • offset的commit

1.2.1 从broker 拉取message

可以通过参数:fetch.min.bytes, fetch.max.wait.ms 来控制拉取batch 的大小以及拉去的时间间隔。这两个参数都是告诉broker,有足够的batch size时再发送给consumer。
官方说明:fetch.min.bytes

This property allows a consumer to specify the minimum amount of data that it
wants to receive from the broker when fetching records. If a broker receives a request
for records from a consumer but the new records amount to fewer bytes than
min.fetch.bytes, the broker will wait until more messages are available before sending
the records back to the consumer. This reduces the load on both the consumer
and the broker as they have to handle fewer back-and-forth messages in cases where
the topics don’t have much new activity (or for lower activity hours of the day). You
will want to set this parameter higher than the default if the consumer is using too
much CPU when there isn’t much data available, or reduce load on the brokers when
you have large number of consumers.

fetch.max.wait.ms:

By setting fetch.min.bytes, you tell Kafka to wait until it has enough data to send
before responding to the consumer. fetch.max.wait.ms lets you control how long to
wait. By default, Kafka will wait up to 500 ms. This results in up to 500 ms of extra
latency in case there is not enough data flowing to the Kafka topic to satisfy the minimum amount of data to return. If you want to limit the potential latency (usually due
to SLAs controlling the maximum latency of the application), you can set
fetch.max.wait.ms to a lower value. If you set fetch.max.wait.ms to 100 ms and
fetch.min.bytes to 1 MB, Kafka will recieve a fetch request from the consumer and
will respond with data either when it has 1 MB of data to return or after 100 ms,
whichever happens first.

kafka cpu跑满 kafka cpu高_kafka cpu跑满_03

2. 持久化

由于持久化这个偏硬件和操作系统,在我们的设计中暂时还用不上,这里做简单记录,不展开讨论

2.1 filesystem

在持久化方面,kafka采用了磁盘顺序写的方式,具体硬件如何实现,这里暂时不讨论,

2.2 pagecache

这个是现代操作系统的设计,可参考http://varnish-cache.org/docs/trunk/phk/notes.html

3.3 persistent queue

这可以让操作kafka 达到O(1), 关用persistent queue,目前还需要进一步研究,这里只做记录。

3. 零拷贝

零拷贝技术在其他IO 框架中(如netty)都很常见。这个是从操作系统级别来实现,原理就是避免数据从内核拷贝到用户缓存区。从而减少CPU的使用。

在没有使用零拷贝技术时,如果要将数据发送到Socket,数据会经历4次拷贝

kafka cpu跑满 kafka cpu高_零拷贝_04


用户态和内核太切换是比较浪费资源和时间的,所以,在零拷贝技术里。我们可以让数据直接在内核态进行拷贝,而不做状态切换。

kafka cpu跑满 kafka cpu高_java_05


在操作系统里,使用的命令就是 sendfile().

4. 压缩(End-to-end Batch Compression)

在网络传输时,Kafka也采用压缩算法,将数据压缩并发送,可以通过compression.type参数来配置压缩算法。

By default, messages are sent uncompressed. This parameter can be set to snappy,gzip, or lz4, in which case the corresponding compression algorithms will be used to compress the data before sending it to the brokers. Snappy compression was invented by Google to provide decent compression ratios with low CPU overhead and good performance, so it is recommended in cases where both performance and bandwidth are a concern. Gzip compression will typically use more CPU and time but result in better compression ratios, so it recommended in cases where network bandwidth is more restricted. By enabling compression, you reduce network utilization and storage, which is often a bottleneck when sending messages to Kafka.