二.kafka的安装

注意:kafka的安装必须要先安装zk,必须要保证时钟同步

2.1 下载上传解压压缩包

cd /export/softwares

tar -zxvf kafka_2.11-1.0.0.tgz  -C ../servers/

 

2.2 修改配置文件

第一台修改配置文件

cd /export/servers/kafka_2.11-1.0.0/config

vim  server.properties

 

broker.id=0

log.dirs=/export/servers/kafka_2.11-1.0.0/logs

zookeeper.connect=node01:2181,node02:2181,node03:2181

delete.topic.enable=true

host.name=node01

 

 

第二台修改配置文件

cd /export/servers/kafka_2.11-1.0.0/config

vim  server.properties

 

broker.id=1

log.dirs=/export/servers/kafka_2.11-1.0.0/logs

zookeeper.connect=node01:2181,node02:2181,node03:2181

delete.topic.enable=true

host.name=node02

 

第三台修改配置文件

cd /export/servers/kafka_2.11-1.0.0/config

vim  server.properties

 

broker.id=2

log.dirs=/export/servers/kafka_2.11-1.0.0/logs

zookeeper.connect=node01:2181,node02:2181,node03:2181

delete.topic.enable=true

host.name=node03

第三步:三台机器启动kafka集群

2.3 前后端启动

 

前台启动:

bin/kafka-server-start.sh config/server.properties

 

进程后台启动:

nohup bin/kafka-server-start.sh config/server.properties &

 

nohup bin/kafka-server-start.sh config/server.properties > /dev/null 2>&1 &

 

2.4 模拟生产消费

模拟消息的生产者:

bin/kafka-console-producer.sh --broker-list node01:9092,node02:9092,node03:9092 --topic test

 

创建topic

bin/kafka-topics.sh --create --partitions 3 --topic test --replication-factor 2 --zookeeper node01:2181,node02:2181,node03:2181

 

模拟消息的消费者

bin/kafka-console-consumer.sh --bootstrap-server node01:9092,node02:9092,node03:9092 --from-beginning  --topic test

 

第三章 kafka模拟生产消费的JavaApi

3.1 生产者API

public class MyKafkaProducer {
    public static void main(String[] args) {
        Properties props = new Properties();
        props.put("bootstrap.servers", "node01:9092,node02:9092,node03:9092");
        props.put("acks", "all");
        props.put("retries", 0);
        props.put("batch.size", 16384);
        props.put("linger.ms", 1);
        props.put("buffer.memory", 33554432);
        props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
        props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");

        Producer<String, String> producer = new KafkaProducer<String,String>(props);
        for (int i = 0; i < 100; i++){
            producer.send(new ProducerRecord<String, String>("test", Integer.toString(i), Integer.toString(i)));
        }
        producer.close();
    }

}

 

3.2 消费者API

 

public class MyKafkaConsumer {
    public static void main(String[] args) {
        /**
         * 自动提交offset
         *
         */

      /*  //这种写法是自动提交offset  偏移量,记录了我们消费到哪一条数据来了
        //offset记录了我们消息消费的偏移量,就是说我们上一次消费到了哪里
        //在kafka新的版本当中,这个offset保存在了一个默认的topic当中
        //每次消费数据之前,获取一下offset偏移量的值,就知道我们该要从哪一条数据消费
        //消费完成之后,offset的值要不要更新。消费完成之后,offset的值一定要更新,才不会造成重复消费的问题
        Properties props = new Properties();
        props.put("bootstrap.servers", "node01:9092,node02:9092,node03:9092");
        //设置我们的消费是属于哪一个组的,这个组名随便取,与别人的不重复即可
        props.put("group.id", "test");
        //设置我们的offset值自动提交
        props.put("enable.auto.commit", "true");
        //offset的值自动提交的频率 1 提交   1.5 消费了500调数据  1.6秒宕机了  2 提交offset
        props.put("auto.commit.interval.ms", "1000");
        props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
        props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
        KafkaConsumer<String, String> consumer = new KafkaConsumer<String,String>(props);
        //消费者订阅我们的topic
        consumer.subscribe(Arrays.asList("test"));
        //相当于开启了一个线程,一直在运行,等待topic当中有数据就去拉取数据
        while (true) {
            //push  poll
            ConsumerRecords<String, String> records = consumer.poll(100);
            for (ConsumerRecord<String, String> record : records)
                System.out.printf("offset = %d, key = %s, value = %s%n", record.offset(), record.key(), record.value());
        }*/

      /*
      手动提交offset    如何保证spark消费kafka当中的数据 exactly  once
      如果数据正常处理,提交offset,如果数据处理失败,不要提交offset
       */
        Properties props = new Properties();
        props.put("bootstrap.servers", "node01:9092,node02:9092,node03:9092");
        props.put("group.id", "test");
        //关闭我们的offset的自动提交
        props.put("enable.auto.commit", "false");
        props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
        props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
        KafkaConsumer<String, String> consumer = new KafkaConsumer<String, String>(props);
        consumer.subscribe(Arrays.asList("test"));
        final int minBatchSize = 200;
        List<ConsumerRecord<String, String>> buffer = new ArrayList<ConsumerRecord<String, String>>();
        while (true) {
            ConsumerRecords<String, String> records = consumer.poll(100);
            for (ConsumerRecord<String, String> record : records) {
                buffer.add(record);
            }
            if (buffer.size() >= minBatchSize) {
             //   insertIntoDb(buffer);
                //手动提交offset的值
                consumer.commitSync();
                buffer.clear();
            }
        }
    }
}

 案例一:flume与kafka的整合

4.1 业务描述

需求:使用flume监控某一个文件夹下面的文件的产生,有了新文件,就将文件内容收集起来放到kafka消息队列当中

source:spoolDir  Source   

channel:memory  channel

sink:数据发送到kafka里面去

4.2 操作步骤

flume与kafka的配置文件开发

第一步:flume下载地址

http://archive.cloudera.com/cdh5/cdh/5/flume-ng-1.6.0-cdh5.14.0.tar.gz

 

第二步:上传解压flume

 

第三步:配置flume.conf

#为我们的source channel  sink起名

a1.sources = r1

a1.channels = c1

a1.sinks = k1

#指定我们的source收集到的数据发送到哪个管道

a1.sources.r1.channels = c1

#指定我们的source数据收集策略

a1.sources.r1.type = spooldir

a1.sources.r1.spoolDir = /export/servers/flumedata

a1.sources.r1.deletePolicy = never

a1.sources.r1.fileSuffix = .COMPLETED

a1.sources.r1.ignorePattern = ^(.)*\\.tmp$

a1.sources.r1.inputCharset = GBK

#指定我们的channel为memory,即表示所有的数据都装进memory当中

a1.channels.c1.type = memory

#指定我们的sink为kafka  sink,并指定我们的sink从哪个channel当中读取数据

a1.sinks.k1.channel = c1

a1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink

a1.sinks.k1.kafka.topic = test

a1.sinks.k1.kafka.bootstrap.servers = node01:9092,node02:9092,node03:9092

a1.sinks.k1.kafka.flumeBatchSize = 20

a1.sinks.k1.kafka.producer.acks = 1

 

启动flume

bin/flume-ng agent --conf conf --conf-file conf/flume.conf --name a1 -Dflume.root.logger=INFO,console

 kafka-manager监控工具

5.1 上传编译好的压缩包并解压

将我们编译好的kafkamanager的压缩包上传到服务器并解压

cd  /export/softwares

unzip kafka-manager-1.3.3.15.zip -d /export/servers/

 

5.2 修改配置文件

cd /export/servers/kafka-manager-1.3.3.15/

vim  conf/application.conf

 

kafka-manager.zkhosts="node01:2181,node02:2181,node03:2181"

5.3 为kafkamanager的启动脚本添加执行权限

cd /export/servers/kafka-manager-1.3.3.15/bin

chmod u+x ./*

5.4 启动kafkamanager的进程

cd /export/servers/kafka-manager-1.3.3.15

nohup bin/kafka-manager  -Dconfig.file=/export/servers/kafka-manager-1.3.3.15/conf/application.conf -Dhttp.port=8070   2>&1 &

 

5.5 浏览器页面访问

http://node01:8070/