目录

一、序言

二、环境准备

2.1 docker环境准备

2.2 安装zookeeper、kafka、kafka-manager环境

2.2.1 zookeeper

2.2.2 kafka

2.2.3 kafka-manager

2.3 安装flume

2.4 安装flink

三、程序开发

3.1.程序生成日志到flume

3.2程序获取kafka中的数据

flink接收

验证数据




一、序言

实验用到的组件有:docker、kafka、kafka-manager、zookeeper、flume;由于资源限制使用docker下安装kafka和zookeeper,在试验机上直接安装flume和kafka-manager。

实验内容:1.本地产生日志数据,通过log4j将日志收集到flume中,flume将数据sink到kafka中;2.flume从kafka中获取数据然后打印到控制台中;(或者使用flink从kafka中拿到数据,添加标识字段后重新放入kafka另一个topic中,注稍后补全这部分)

实验目的:通过实验学习到docker安装、使用、kafka操作、flume操作以及部署工作;


所需maven依赖包

<!-- kafka start -->
        <dependency>
            <groupId>org.apache.kafka</groupId>
            <artifactId>kafka_2.11</artifactId>
            <version>0.9.0.1</version>
        </dependency>

        <!-- kafka end -->

        <!--flume-->
        <dependency>
            <groupId>org.apache.flume.flume-ng-clients</groupId>
            <artifactId>flume-ng-log4jappender</artifactId>
            <version>1.7.0</version>
        </dependency>
        <dependency>
            <groupId>org.apache.flume</groupId>
            <artifactId>flume-ng-core</artifactId>
            <version>1.7.0</version>
        </dependency>
        <dependency>
            <groupId>org.apache.flume</groupId>
            <artifactId>flume-ng-configuration</artifactId>
            <version>1.7.0</version>
        </dependency>

        <!--flink-->
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-java</artifactId>
            <version>1.5.0</version>
        </dependency>
        <!-- https://mvnrepository.com/artifact/org.apache.flink/flink-connector-kafka-0.9 -->
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-connector-kafka-0.9_2.11</artifactId>
            <version>1.5.0</version>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-streaming-java_2.11</artifactId>
            <version>1.5.0</version>
        </dependency>




        <!-- log4j start -->
        <dependency>
            <groupId>log4j</groupId>
            <artifactId>log4j</artifactId>
            <version>1.2.17</version>
        </dependency>
        <dependency>
            <groupId>org.slf4j</groupId>
            <artifactId>slf4j-api</artifactId>
            <version>1.7.5</version>
        </dependency>
        <dependency>
            <groupId>org.slf4j</groupId>
            <artifactId>slf4j-log4j12</artifactId>
            <version>1.7.5</version>
        </dependency>
        <!-- log4j end -->

        <!-- https://mvnrepository.com/artifact/com.alibaba/fastjson -->
        <dependency>
            <groupId>com.alibaba</groupId>
            <artifactId>fastjson</artifactId>
            <version>1.2.47</version>
        </dependency>

二、环境准备

2.1 docker环境准备

实验机环境为centos7


2.2 安装zookeeper、kafka、kafka-manager环境

2.2.1 zookeeper

使用docker search zookeeper命令获取资源库zookeeper列表

下载首先pull获取 wurstmeister的zookeeper

启动zookeeper:docker run -d --name zookeeper -p 2181:2181 -t

2.2.2 kafka

使用docker search zookeeper命令获取资源库kafka列表

下载首先pull获取 wurstmeister的kafka

启动kafka:

docker run -d --name kafka -p 9092:9092  -e KAFKA_BROKER_ID=0  -e KAFKA_ZOOKEEPER_CONNECT=192.168.83.112:2181  -e KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://192.168.83.112:9092  -e KAFKA_LISTENERS=PLAINTEXT://0.0.0.0:9092 -t wurstmeister/kafka

解释:

KAFKA_BROKER_ID=0               //broker id,如果想要启动多个就执行多次命令保证 id不相同就行了

KAFKA_ZOOKEEPER_CONNECT=192.168.83.112:2181  //外界连接kafka所需

KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://192.168.83.112:9092  //外界连接kafka所需,地址是宿主机地址

KAFKA_LISTENERS=PLAINTEXT://0.0.0.0:9092

2.2.3 kafka-manager

安装在宿主机上,没有安装到docker中,因为docker中的镜像存在组件缺失

从 GitHub 下载后编译,编译方法自行百度,或者直接下载已经编译好的

链接:https://pan.baidu.com/s/1zmhG6-eP_0RsGDxvcEMzyw

密码:sc8w

解压到最终安装位置,然后配置两项

kafka-manager.zkhosts="192.168.83.112:2181
akka {
  loggers = ["akka.event.slf4j.Slf4jLogger"]
  loglevel = "INFO"
  logger-startup-timeout = 30s
}

然后执行启动命令

nohup bin/kafka-manager -Dconfig.file=conf/application.conf -Dhttp.port=9000 &

2.3 安装flume

从官网下载文件,解压后上传到最终安装目录,

配置java路径到flume-env.sh

export JAVA_HOME=/usr/java/jdk1.8.0_171-amd64

2.4 安装flink

从官网下载文件,解压上传到最终目录

因为本次试验使用的是单机版的,因此直接在bin目录下运行.start-cluster.sh即可,web端口为8081

至此,我们的环境已经全部准备好了


三、程序开发

3.1.程序生成日志到flume

log4j配置:

log4j.rootLogger=INFO,flume,console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.out
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern="%d{yyyy-MM-dd HH:mm:ss} %p [%c:%L] - %m%n

log4j.appender.flume = org.apache.flume.clients.log4jappender.Log4jAppender
log4j.appender.flume.Hostname = 192.168.83.112
log4j.appender.flume.Port = 44444
log4j.appender.flume.UnsafeMode = true
log4j.appender.flume.layout=org.apache.log4j.PatternLayout
log4j.appender.flume.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %p [%c:%L] - %m%n

循环生成日志

import java.util.Date;
import org.apache.log4j.Logger;
public class WriteLog {
    private static Logger logger = Logger.getLogger(WriteLog.class);
	public static void main(String[] args) throws InterruptedException {
	// 记录debug级别的信息  
        logger.debug("This is debug message.");  
        // 记录info级别的信息  
        logger.info("This is info message.");  
        // 记录error级别的信息  
        logger.error("This is error message.");
        int i = 0;
		while (true) {
			logger.info(new Date().getTime());
			logger.info("测试数据" + i);
			Thread.sleep(2000);
			i += 1;
		}
	}
}

在flume中的conf,复制模板为example.conf

书写配置:

# Name the components on this agent
# 定义一个 agent 的元素

a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
# 配置 source

#使用avro接收log4j过来的数据
a1.sources.r1.type = avro
a1.sources.r1.bind = 0.0.0.0
a1.sources.r1.port = 44444

# Describe the sink
# 配置 sink

#a1.sinks.k1.type = logger
#将数据写入kafka,设置topic和brokers地址
a1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
a1.sinks.k1.topic = test
a1.sinks.k1.brokerList = 192.168.83.112:9092
a1.sinks.k1.requiredAcks = 1
a1.sinks.k1.batchSize = 100

# Use a channel which buffers events in memory
# 定义 channel

a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
# 用 channel 连接起来 source 和 sink

a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

执行flume命令

flume-ng agent -c /opt/soft/apache-flume-1.8.0-bin/conf -f example.conf --name a1 -Dflume.root.logger=INFO,console

启动java程序

 

3.2程序获取kafka中的数据

java程序:

import org.apache.flume.*;
import org.apache.flume.conf.Configurable;
import org.apache.flume.sink.AbstractSink;

/**
 * Created by anan on 2018-7-31 14:20.
 */
public class CustomerSource extends AbstractSink implements Configurable {
    @Override
    public Status process() throws EventDeliveryException {
        Status status = null;
        // Start transaction
        Channel ch = getChannel();
        Transaction txn = ch.getTransaction();
        txn.begin();
        try {
            // This try clause includes whatever Channel operations you want to
            // do
            Event event = ch.take();
            // Send the Event to the external repository.
            // storeSomeData(e);
            String eventBody = new String(event.getBody(), "utf-8");

            System.out.println("============= " + eventBody + " ========");

            txn.commit();
            status = Status.READY;
        } catch (Throwable t) {
            txn.rollback();
            // Log exception, handle individual exceptions as needed
            status = Status.BACKOFF;

            // re-throw all Errors
            if (t instanceof Error) {
                throw (Error) t;
            }
        }
        // you must add this line of code in order to close the Transaction.
        txn.close();
        return status;
    }

    @Override
    public void configure(Context context) {

    }

    @Override
    public synchronized void start() {
        super.start();
    }

    @Override
    public synchronized void stop() {
        super.stop();
    }
}

新建flume conf test.conf

#soource的名字
agent.sources = kafkaSource
agent.channels = memoryChannel
agent.sinks = hdfsSink


agent.sources.kafkaSource.channels = memoryChannel
agent.sinks.hdfsSink.channel = memoryChannel

#-------- kafkaSource相关配置-----------------
agent.sources.kafkaSource.type = org.apache.flume.source.kafka.KafkaSource
agent.sources.kafkaSource.zookeeperConnect =192.168.83.112:2181
# 配置消费的kafka topic
agent.sources.kafkaSource.topic = test
# 配置消费者组的id
agent.sources.kafkaSource.groupId = flume
# 消费超时时间,参照如下写法可以配置其他所有kafka的consumer选项。注意格式从kafka.xxx开始是consumer的配置属性
agent.sources.kafkaSource.kafka.consumer.timeout.ms = 100



#------- memoryChannel相关配置-------------------------
agent.channels.memoryChannel.type = memory
agent.channels.memoryChannel.capacity=10000
agent.channels.memoryChannel.transactionCapacity=1000

#---------hdfsSink 相关配置------------------
agent.sinks.hdfsSink.type = com.gd.bigdataleran.flume.customerSource.CustomerSource

执行flume命令

flume-ng agent -c /opt/soft/apache-flume-1.8.0-bin/conf -f test.conf --name agent -Dflume.root.logger=INFO,console

然后查看控制台是否打印生成的日志。

flink接收

如果想要flink接收kafka数据然后将数据经过简单处理后放到kafka,就需要使用到flinkkafkaconsumer和flinkkafkaproductor

,java代码如下

package com.gd.bigdataleran.flink;

import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.common.serialization.SimpleStringSchema;
import org.apache.flink.streaming.api.TimeCharacteristic;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.DataStreamSink;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.sink.RichSinkFunction;
import org.apache.flink.streaming.api.functions.sink.SinkFunction;
import org.apache.flink.streaming.api.functions.source.RichSourceFunction;
import org.apache.flink.streaming.api.functions.source.SourceFunction;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer09;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer09;
import org.apache.flink.streaming.connectors.kafka.internals.KafkaTopicPartition;

import java.util.Date;
import java.util.HashMap;
import java.util.Map;
import java.util.Properties;


/**
 * Created by anan on 2018-8-3 15:44.
 */
public class kafkaconsumer {
    public static void main(String[] args) throws Exception {
        final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.enableCheckpointing(5000); // 非常关键,一定要设置启动检查点!!
        env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
        Properties properties = new Properties();
        properties.setProperty("bootstrap.servers", "192.168.83.112:9092");
//        properties.setProperty("zookeeper.connect", "192.168.83.112:2181");
        properties.setProperty("group.id", "test112");

        FlinkKafkaConsumer09<String> myConsumer = new FlinkKafkaConsumer09<>("test", new SimpleStringSchema(), properties);
        myConsumer.setStartFromEarliest();
        System.out.println("执行输入");
        DataStream<String> stream = env.addSource(myConsumer);
        DataStream ds = stream.map(new MapFunction<String, Object>() {
            @Override
            public Object map(String s) throws Exception {
                return s + "==" + new Date().getTime();
            }
        });

        FlinkKafkaProducer09<String> flinkKafkaProducer09 = new FlinkKafkaProducer09<String>("192.168.83.112:9092","test1",new SimpleStringSchema());
        ds.addSink(flinkKafkaProducer09);
        System.out.println("执行输出");

        env.execute();
    }
}

将需要的jar包上传到flink目录下的lib目录下,然后执行flink命令即可

flume-ng agent -c /opt/soft/apache-flume-1.8.0-bin/conf -f example.conf --name a1 -Dflume.root.logger=INFO,console

验证数据

使用java连接kafka,获取topic中的数据进行验证;验证代码 如下

private void getKafkaData() {
        String topic = "test1";
        Properties kafkaProps = new Properties();
        kafkaProps.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
        kafkaProps.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
        kafkaProps.put("bootstrap.servers", "192.168.83.112:9092");
        kafkaProps.put("zookeeper.connect", "192.168.83.112:2181");
        kafkaProps.put("group.id", "farmtest1");
        kafkaProps.put("auto.offset.reset", "smallest");
        kafkaProps.put("enable.auto.commit", "true");
        ConsumerConnector consumer = Consumer.createJavaConsumerConnector(new ConsumerConfig(kafkaProps));

        Map<String, Integer> topicCountMap = new HashMap<String, Integer>();
        topicCountMap.put(topic, 1); // 一次从主题中获取一个数据
        Map<String, List<KafkaStream<byte[], byte[]>>> messageStreams = consumer.createMessageStreams(topicCountMap);
        KafkaStream<byte[], byte[]> stream = messageStreams.get(topic).get(0);// 获取每次接收到的这个数据
        ConsumerIterator<byte[], byte[]> iterator = stream.iterator();
        while (iterator.hasNext()) {
            String message = new String(iterator.next().message());
            consumer.commitOffsets();
            System.out.println(message);
        }
    }

 

时间差匆忙,有问题欢迎大家提问讨论