kafka 内存一般限制多大 kafka内存消耗

转载

lazihuman 2024-03-20 11:01:44

文章标签 kafka 内存一般限制多大 kafka Producer batch.size 数据丢失 文章分类 架构后端开发

一.引言

使用 KafkaProducer 生产数据并按照 interval = 60s 的间隔进行 streaming 日志回收，本地执行 KafkaProducer.send 操作后发现回收日志中并没有相关日志，排查原因过程中顺带整理了一下 Kafka 常用参数。

二.常用参数

完整的参数介绍与初始化可以参考 org.apache.kafka.clients.producer.ProducerConfig 类，里面有更详细的参数介绍与初始化类型与大小

1.buffer.memory

kafka 客户端发送数据到服务器会经过缓冲区，通过 KafkaProducer 发送出去的消息都是先进入客户端本地的缓存里，再通过 Sender 线程将多个 batch 的数据发送到 Broker，buffer_memory 默认为 32mb

define("buffer.memory", Type.LONG, 33554432L, ...)

存在的问题:

消息写入过快或者写入量过大，Sender 线程来不及处理，造成缓存区堆积，此时会阻塞用户线程，禁止往 kafka 写入消息，一般需要根据业务场景估算一个 buffer_memory 的合理值

2.batch.size

缓存区 buffer_memory 中存储着多个 batch_size 的数据，正常情况下一个 batch_size 中存储多条数据，每达到 batch_size 后，就会有一批数据写入缓存区。提升 batch_size 可以提升程序的吞吐，但是过大的增加 batch_size 可能会导致缓存区数据量过大或者数据延时发送情况增加

define("batch.size", Type.INT, 16384, ...)

3.max.bloack.ms

如果缓存区打满，发送延迟达到该延迟 ms 时，程序会抛出异常，此时需要提高缓存区的大小或者相应的调整程序写入的速率

4.linger.ms

有一种情况，缓存区迟迟未满，数据长时间无法写入，此时涉及到参数 linger.ms ，该函数的含义是当一个 batch 从创建开始过去 linger.ms 后，不管该 batch 满不满都会写入缓存区发送，也就是最长等待时间，避免一个 batch 没有打满而迟迟不发送的情况，默认值为0

define("linger.ms", Type.LONG, 0, ...)

存在的问题:

假如正常流量下 20ms 就能凑够一个 batch，该参数就应该设置在 20+，因为设置到小于 20ms 会导致 batch 数据还没满就发送，那么 batch 批量发送的意义就不大了

5.max.request.size

这个参数决定了每次发送给Kafka服务器请求的最大大小，同时也会限制你一条消息的最大大小也不能超过这个参数设置的值，默认1 mb

.define("max.request.size", Type.INT, 1048576, ...)

6.retries

日志写入失败的尝试次数

7. reconnect.backoff.ms

网络中断后重连的退避时间，单位为 ms

8.serializer

key/value.serializer kafka Key Value 的序列化方式, StringSerializer、ByteArraySerializer

9.compression.type

日志的压缩格式，"none", "gzip" and "snappy"，默认为 "none"

三.数据发送丢失分析

先看下我的 kafka 版本和配置的相关参数:

版本: 0.10.2.1

参数:

// 日志压缩方式
    props.put("compression.type", "snappy")
    // 日志写入失败的重试次数
    props.put("retries", "3")
    //网络中断后重连的退避时间 单位毫秒
    props.put("reconnect.backoff.ms", "1000")
    //日志等待一段时间 将此时间段内的日志批量发送
    props.put("linger.ms", "1000")
    //key 序列化方式
    props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer")
    //value 序列化方式
    props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer")

这里没有设置 batch_size，所以按默认的来算 16384 byte = 16 kb，linger.ms = 1000 ms，缓存区的数据还没发送程序就结束了，导致 kafka 没有写出数据，可能因素有两个:

1.数据大小

写入数据太小，未达到 batch_size，batch 对应数据未送入缓存区从而未发送

验证:

一条 kafka 记录的大小就是一个 ProducerRecord 的大小，其中不仅包括要发送信息的大小，还包括 ProducerRecord 元数据的大小，可以通过 org.apache.spark.util.SizeEstimator.estimate 类对该数据大小进行估算，为了排除数据压缩的影响，将 compreesin.type 改为 None(snappy 压缩比率约为20%)，共10条数据，每条 4000 byte 左右，所以 10 * 4000 > 16384 ，排除数据太小的原因

val data = new ProducerRecord[String, String](topic, message)
    val data_size = org.apache.spark.util.SizeEstimator.estimate(data)

2.linger.ms 延迟

linger.ms 设置太高，数据还没发送但是主程序已经结束，所以发送失败，该参数会等待一定时间得以让 buffer_memory 中的内容批量发送 request，程序结束太快导致没到 linger.ms 程序就结束了，所以发送失败

验证:

为了验证这里比较极端，调整 linger.ms = 0，经过测试可以正常接收数据，下面为官方解释，有兴趣可以翻一下~

"The producer groups together any records that arrive in between request transmissions into a single batched request. "
+ "Normally this occurs only under load when records arrive faster than they can be sent out. However in some circumstances the client may want to "
+ "reduce the number of requests even under moderate load. This setting accomplishes this by adding a small amount "
+ "of artificial delay—that is, rather than immediately sending out a record the producer will wait for up to "
+ "the given delay to allow other records to be sent so that the sends can be batched together. This can be thought "
+ "of as analogous to Nagle's algorithm in TCP. This setting gives the upper bound on the delay for batching: once "
+ "we get <code>" + BATCH_SIZE_CONFIG + "</code> worth of records for a partition it will be sent immediately regardless of this "
+ "setting, however if we have fewer than this many bytes accumulated for this partition we will 'linger' for the "
+ "specified time waiting for more records to show up. This setting defaults to 0 (i.e. no delay). Setting <code>" + LINGER_MS_CONFIG + "=5</code>, "
+ "for example, would have the effect of reducing the number of requests sent but would add up to 5ms of latency to records sent in the absense of load.";

3.发送数据的方法

A.调低 linger.ms (推荐)

正常集群环境下，可以设置 linger.ms 为 200ms - 300ms ,也可以根据自己场景配置

B. KafkaProducer.flush()

该方法会将缓存区的内容全部执行发送，但是会造成堵塞

C. KafkaProducer.close()

该方法会将生产者关闭，关闭前将缓冲区内容全部执行发送，会造成堵塞

四.batch 生成逻辑

上面简单了解了参数和 kafka 写数据的简要流程，其中有一步是生成 batch ，这个步骤和 batzh.size 相关联，还有一个参数 max.request.size 控制一次请求的最大大小，正常情况下 batch.size < max.request.size，趁热打铁，整理一下常见的 batch 生成情况:

1.单条数据

A.低于 batch.size

等到 linger.ms 延时时间到，生成 batch 进入缓存区

B.介于 batch.size 和 max.request.size

生成一个大于 batch.size 的大 batch 进入缓存区

C.大于 max.request.size

异常

2.多条数据

A. N 条数据 = batch.size

N 条数据包装为一个 batch 进入缓存区

B.N 条数据 < batch.size

N 条数据等待 linger.ms 延时后包装为一个 batch 进入缓存区

C.N + 1条数据 ( N条 < batch.size ，N +1 条 > batch.size)

N 条数据包装为 batch 送入缓存区，1 条等待新的数据达到 batch.size 的触发条件

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。

上一篇：NGINX 空格 nginx index on

下一篇：nginx 显示头信息 nginx http头

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯