继续kafka学习之旅。今天学习的还是<深入理解kafka-核心设计与实践原理>一书。上一篇博客,学习了消费者,今天继续学习消费者。
auto.offset.reset。这个参数的意思是:当kafka消费者在_consumer_offset主题中找不到所属分区的offset时,该参数就派上用场了,改参数有三个可选值,latest、earilst、none。第一个取值是说,当消费者找不到偏移量时,就从日志尾部开始消费,earilst是从日志头部开始消费。none是既不从头也不从尾,而是抛出NoOffsetForPartitionException异常。
除了消费者找不到位移,会使用该参数外。如果消费者能找到偏移量,但是对应偏移量上没有消息,也会使用该参数。
但是,大家也能看出来,这个参数的粒度太大了。不是从头,就是从尾。有没有一种方法能我们自己选择消费的位置呢?
有。
kafka提供了seek方法,可以让我们从分区的固定位置开始消费。
入参为seek (TopicPartition topicPartition,offset offset)。前面我们讲过TopicPartition这个对象里有2个成员变量。一个是Topic,一个是partition。再结合offset,完全就可以定位到某个主题、某个分区的某个leader副本的active日志文件的某个位置。
offset是指分区的消息偏移量
下面看一下seek方法的使用例子:

rotected static Properties initConfig(){
        Properties properties = new Properties();
        properties.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, brokerList);
        properties.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
        properties.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
        properties.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "false");
        properties.put(ConsumerConfig.GROUP_ID_CONFIG,groupId);
        properties.put(ConsumerConfig.CLIENT_ID_CONFIG,clientId);
        return properties;
    }

public static void main(String[] args) {
        KafkaConsumer<String, String> kafkaConsumer = new KafkaConsumer<String, String>(initConfig());
        kafkaConsumer.subscribe(Arrays.asList(topic));
        //kafka的分区逻辑是在poll方法里执行的,所以执行seek方法之前先执行一次poll方法
        //获取当前消费者消费分区的情况
        Set<TopicPartition> assignment = new HashSet<>();
        while (assignment.size() == 0) {
            //如果没有分配到分区,就一直循环下去
            kafkaConsumer.poll(100L);
            assignment = kafkaConsumer.assignment();
        }
        for (TopicPartition tp : assignment) {
            //消费第当前分区的offset为10的消息
            kafkaConsumer.seek(tp, 10);
        }
        while (isRunning.get()) {
            ConsumerRecords<String, String> consumerRecords = kafkaConsumer.poll(2000L);
            System.out.println("本次拉取的消息数量:" + consumerRecords.count());
            System.out.println("消息集合是否为空:" + consumerRecords.isEmpty());
            for (ConsumerRecord<String, String> consumerRecord : consumerRecords) {
                System.out.println("消费到的消息key:" + consumerRecord.key() + ",value:" + consumerRecord.value() + ",offset:" + consumerRecord.offset());
            }
        }
    }

上面的情形是我们知道具体的消费位置,如果我们不知道具体的消费位置呢?日常开发过程中,我们可能有从某一个时间段开始消费的场景。比如:从昨天的某个时间点开始消费
kafka提供了一个offsetForTimes方法获取某一个时间的消息的偏移量和时间戳,我们获取到偏移量,就可以使用seek方法从某个时间段开始消费了,示例如下:

public static void main(String[] args) {
        KafkaConsumer<String, String> kafkaConsumer = new KafkaConsumer<>(initConfig());
        kafkaConsumer.subscribe(Arrays.asList(topic));
        Set<TopicPartition> assignment = new HashSet<>();
        while (assignment.size() == 0) {
            kafkaConsumer.poll(100L);
            assignment = kafkaConsumer.assignment();
        }
        Map<TopicPartition, Long> map = new HashMap<>();
        for (TopicPartition tp : assignment) {
            map.put(tp, System.currentTimeMillis() - 1 * 24 * 3600 * 1000);
        }
        Map<TopicPartition, OffsetAndTimestamp> offsets = kafkaConsumer.offsetsForTimes(map);
        for (TopicPartition topicPartition : offsets.keySet()) {
            OffsetAndTimestamp offsetAndTimestamp = offsets.get(topicPartition);
            if (offsetAndTimestamp != null) {
                kafkaConsumer.seek(topicPartition,offsetAndTimestamp.offset());
            }
        }
        while (isRunning.get()) {
            ConsumerRecords<String, String> consumerRecords = kafkaConsumer.poll(1000L);
            System.out.println("本次拉取的消息数量:" + consumerRecords.count());
            System.out.println("消息集合是否为空:" + consumerRecords.isEmpty());
            for (ConsumerRecord<String, String> consumerRecord : consumerRecords) {
                System.out.println("消费到的消息key:" + consumerRecord.key() + ",value:" + consumerRecord.value() + ",offset:" + consumerRecord.offset());
            }
        }
    }

既然seek方法只认partition和offset,那么我们完全可以将partiton和下一次要消费的offset存入数据库,操作如下:

public static void main(String[] args) {
        KafkaConsumer<String, String> kafkaConsumer = new KafkaConsumer<>(initConfig());
        kafkaConsumer.subscribe(Arrays.asList(topic));
        Set<TopicPartition> assignment = new HashSet<>();
        while (assignment.size() == 0) {
            kafkaConsumer.poll(100L);
            assignment = kafkaConsumer.assignment();
        }
        for (TopicPartition tp : assignment) {
            Long offset = getOffsetFromDB(tp);
            kafkaConsumer.seek(tp,offset);
        }
        while (isRunning.get()) {
            ConsumerRecords<String, String> consumerRecords = kafkaConsumer.poll(1000L);
            Set<TopicPartition> partitions = consumerRecords.partitions();
            for (TopicPartition tp : partitions) {
                List<ConsumerRecord<String, String>> records = consumerRecords.records(tp);
                for (ConsumerRecord<String, String> record : records) {
                    //process record
                }
                long lastConsumedOffset = records.get(records.size() - 1).offset();
                //保存位移
                storeOffsetToDB(tp,lastConsumedOffset+1);
            }
        }
    }

    private static void storeOffsetToDB(TopicPartition tp, Long offset) {
        
    }

    private static Long getOffsetFromDB(TopicPartition tp) {
        return null;
    }