参考:深入理解Kafka核心设计和实践原理
5、分区
分区器:为消息指定对应的分区。消息通过send()方法发往broker的过程中,还有可能经历拦截器、序列化器和分区器的才能到达broker上。
如果发送的消息没有带partition的话那么就需要利用分区器家昂消息发往对应的partition中,分区器根据key这个字段计算partition的值。
分区器对key进行哈希,利用MurmurHash2saunfa ,根据得到的哈希值计算哈希值,具有相同key的消息备发往同一个partition(这就是前面所说的相同的key可以被发送到同一个partition中),如果key为null,那么消息将会以轮询的方式发往topic没的每个可用的partition中。
/**
- Create a record to be sent to Kafka
- (有key,对key进行hash,同一个key的消息被划分到同一个分区中去,而分区是有序的)
- @param topic The topic the record will be appended to
- @param key The key that will be included in the record
- @param value The record contents
*/
public ProducerRecord(String topic, K key, V value) {
this(topic, null, null, key, value, null);
}
/**
- Create a record with no key
- (无key,通过轮询的方式发送到每个可用的partition中)
- @param topic The topic this record should be sent to
- @param value The record contents
*/
public ProducerRecord(String topic, V value) {
this(topic, null, null, null, value, null);
}
6、拦截器
拦截器:生产者拦截器和消费者拦截器。生产者拦截器可以用来在消息发送前做一些准备工作,比如过来一些数据等。
1、生产者拦截器源码 ProducerInterceptor分析:
public interface ProducerInterceptor<K, V> extends Configurable {
// Any exception thrown by this method will be caught by the caller and logged, but not propagated further.
//(此方法引发的任何异常都将被调用方捕获并记录,但不会进一步传播)
//Since the producer may run multiple interceptors, a particular interceptor's onSend() callback will be called in the order
//(由于生产者可以运行多个拦截器,因此将按顺序调用特定拦截器的onsend()回调)
// @param record the record from client or the record returned by the previous interceptor in the chain of interceptors.
//(record参数来自客户机的记录或拦截器链中上一个拦截器返回的记录)
public ProducerRecord<K, V> onSend(ProducerRecord<K, V> record);
// This method is called when the record sent to the server has been acknowledged, or when sending the record fails before。
//(当发送到服务器的记录被确认时,或者在发送记录之前失败时,调用此方法。)
// This method will generally execute in the background I/O thread, so the implementation should be reasonably fast.
//(此方法通常在后台I/O线程中执行,因此实现速度应该相当快。)
// Otherwise, sending of messages from other threads could be delayed.
//(但是,来自其他线程的消息发送可能会延迟。)
public void onAcknowledgement(RecordMetadata metadata, Exception exception);
public void close();
}
执行的顺序:KafkaProducer在将消息序列化和分区之前会调用生产者的拦截器的onSend()方法来对消息进行相应的操作。
例子:
在消息前面添加时间戳的拦截器
package com.paojiaojiang.interceptor;
import org.apache.kafka.clients.producer.ProducerInterceptor;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.clients.producer.RecordMetadata;
import java.util.Map;
/**
- @Author: jja
- @Description: 在消息前面添加时间戳的拦截器
- @Date: 2019/3/20 23:53
*/
public class TimeInterceptor implements ProducerInterceptor<String, String> {
@Override
public ProducerRecord<String, String> onSend(ProducerRecord<String, String> record) {
// 新建一个新的record,把时间戳斜土消息的头部
String value = "paojiaojiang----->" + record.value();
return new ProducerRecord<>(record.topic(), record.partition(), record.timestamp(), record.key(), value, record.headers());
//
// return new ProducerRecord(record.topic(),
// record.partition(),
// record.timestamp(),
// record.key(),
// System.currentTimeMillis() + "," +
// record.value().toString());
}
@Override
public void onAcknowledgement(RecordMetadata metadata, Exception exception) {
}
@Override
public void close() {
}
@Override
public void configure(Map<String, ?> configs) {
}
}
统计发送的成功条数和失败条数
package com.paojiaojiang.interceptor;
import org.apache.kafka.clients.producer.ProducerInterceptor;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.clients.producer.RecordMetadata;
import java.util.Map;
/**
- @Author: jja
- @Description: 统计发送的成功条数和失败条数,在producer closer前打印结果
- @Date: 2019/3/21 0:02
*/
public class CountInterceptor implements ProducerInterceptor {
private int errorCount = 0;
private int successCount = 0;
@Override
public ProducerRecord onSend(ProducerRecord record) {
return null;
}
@Override
public void onAcknowledgement(RecordMetadata metadata, Exception exception) {
// 进行统计
if (exception == null){
successCount++;
}else {
errorCount++;
}
}
@Override
public void close() {
System.out.println("成功的条数:" + successCount);
System.out.println("失败的条数 " + errorCount);
}
@Override
public void configure(Map<String, ?> configs) {
}
}
拦截器:
package com.paojiaojiang.interceptor;
import kafka.javaapi.producer.Producer;
import kafka.producer.KeyedMessage;
import kafka.producer.ProducerConfig;
import java.util.ArrayList;
import java.util.List;
import java.util.Properties;
/**
- @Author: jja
- @Description:
- @Date: 2019/3/21 0:12
*/
public class InterceptorProducer implements Runnable{
public static String TOPIC = "paojiaopjiang";
private Producer<String, String> producer;
private ProducerConfig config = null;
public InterceptorProducer() {
Properties props = new Properties();
props.put("zookeeper.connect", "spark:2181,spark1:2181,spark2:2181");
// 指定序列化处理类,默认为kafka.serializer.DefaultEncoder,即byte[]
props.put("serializer.class", "kafka.serializer.StringEncoder");
// 同步还是异步,默认2表同步,1表异步。异步可以提高发送吞吐量,但是也可能导致丢失未发送过去的消息
props.put("producer.type", "sync");
// 是否压缩,默认0表示不压缩,1表示用gzip压缩,2表示用snappy压缩。压缩后消息中会有头来指明消息压缩类型,故在消费者端消息解压是透明的无需指定。
props.put("compression.codec", "1");
// 指定kafka节点列表,用于获取metadata(元数据),不必全部指定
props.put("metadata.broker.list", "spark:9092,spark1:9092,spark2:9092");
// 构建两个拦截器
List<String> interceptors = new ArrayList<>();
// interceptors.add("com.paojiaojiang.interceptor.CountInterceptor"); // 时间拦截器
interceptors.add("com.paojiaojiang.interceptor.TimeInterceptor"); // 计数拦截器
props.put(org.apache.kafka.clients.producer.ProducerConfig.INTERCEPTOR_CLASSES_CONFIG, interceptors);
config = new ProducerConfig(props);
}
@Override
public void run() {
producer = new Producer<>(config);
for (int i = 1; i <= 3; i++) { //往3个分区发数据
List<KeyedMessage<String, String>> messageList = new ArrayList<>();
for (int j = 0; j < 10; j++) { //每个分区10条消息
messageList.add(new KeyedMessage<>
//String topic, String partition, String message
(TOPIC, "partition[----" + i + "]", "message[----The " + i + "------ message]" + TOPIC));
}
System.out.println(TOPIC);
producer.send(messageList);
}
producer.close();
}
public static void main(String[] args) {
Thread t = new Thread(new com.paojiaojiang.producer.KafkaProducer1());
t.start();
}
}