kafka高水位数据丢失 kafka高吞吐量

转载

mob6454cc7a6087 2024-07-25 19:40:11

文章标签 kafka高水位数据丢失 java kafka 数据缓存 文章分类 架构后端开发

kafka 高吞吐量介绍

零拷贝
将数据直接从磁盘文件复制到网卡设备中，而不需要经过应用程序，减少了内核和用户模式之间的上下文切换，依赖于底层的senffile()方法。

内核态是对系统硬件资源的控制，用户态是应用程序，上层应用。在系统调用的时候，会发生用户从用户态到内核态的上下文切换。

kafka高水位数据丢失 kafka高吞吐量_kafka高水位数据丢失

如果我们需要将数据展示给用户，我们就需要先将数据拷贝到内存，在从内存把数据放到socket，展示给用户。

kafka高水位数据丢失 kafka高吞吐量_缓存_02

通过sendfile系统调用，提供了零拷贝。数据通过DMA(Direct Memory Access)拷贝到内核态Buffer后，直接通过DMA拷贝到NIC Buffer，无需CPU拷贝。

PlaintextTransportLayer.class
@Override
public long transferFrom(FileChannel fileChannel, long position, long count) throws IOException {
    return fileChannel.transferTo(position, count, socketChannel);
}

磁盘顺序写和页缓存
kafka设计采用文件追加的方式，只能在文件的尾部追加新的消息，不允许修改已经写入消息。当broker接收到producer发送过来的消息的时候，根据消息的主题和分区情况，将该消息写入到该分区的最后的一个segment文件中。采用这种追加的方式避免了磁盘随机写入磁盘寻道的开销。
如果每条消息都执行一次磁盘写入，也还是会造成很大的I/O开销，kafka使用了大量的页缓存，消息先被写入到页缓存，然后由操作系统负责将数据刷到磁盘，kafka提供了同步刷盘和间歇性刷盘的功能。除了页缓存批量刷盘消息，kafka服务重启，页缓存的数据还是保持有效。
消息批量写入和消息压缩kafka消息压缩方式是将多条消息一起压缩，这样的压缩效果比较好；compression .type来配置压缩方式，有GZIP、SNAPPY、LS4，默认是不压缩的。kafka 消息格式

• public static int writeTo(DataOutputStream out,
                          int offsetDelta,
                          long timestampDelta,
                          ByteBuffer key,
                          ByteBuffer value,
                          Header[] headers) throws IOException {
    int sizeInBytes = sizeOfBodyInBytes(offsetDelta, timestampDelta, key, value, headers);
    ByteUtils.writeVarint(sizeInBytes, out);
    byte attributes = 0; // there are no used record attributes at the moment
    out.write(attributes); //属性
    ByteUtils.writeVarlong(timestampDelta, out); // 时间增量
    ByteUtils.writeVarint(offsetDelta, out); // 偏移增量
    if (key == null) {
        ByteUtils.writeVarint(-1, out);
    } else {
        int keySize = key.remaining();
        ByteUtils.writeVarint(keySize, out);
        Utils.writeTo(out, key, keySize); //key keySize
    }
    if (value == null) {
        ByteUtils.writeVarint(-1, out);
    } else {
        int valueSize = value.remaining();
        ByteUtils.writeVarint(valueSize, out);
        Utils.writeTo(out, value, valueSize); // value valueSize
    }
    if (headers == null)
        throw new IllegalArgumentException("Headers cannot be null");
    ByteUtils.writeVarint(headers.length, out);
    for (Header header : headers) {
        String headerKey = header.key();
        if (headerKey == null)
            throw new IllegalArgumentException("Invalid null header key found in headers");
        byte[] utf8Bytes = Utils.utf8(headerKey);
        ByteUtils.writeVarint(utf8Bytes.length, out);
        out.write(utf8Bytes);
        byte[] headerValue = header.value();
        if (headerValue == null) {
            ByteUtils.writeVarint(-1, out);
        } else {
            ByteUtils.writeVarint(headerValue.length, out); // headerValue.length
            out.write(headerValue); // headerValue
        }
    }
    return ByteUtils.sizeOfVarint(sizeInBytes) + sizeInBytes;
}

上图是一条消息的所有数据信息，采用很多的varInt、varLong等变长整型和ZigZag编码。会把消息追加到batch里面去。