前面说过SparkStreaming分别整合Flume和Kafka,但是在实际开发中往往需要的是SparkStreaming整和Kafka和Flume一起使用。。。

下面就来看一下如何使用。。。

首先来看一下整体的架构图:

flowable 没配kafka 报错_flowable 没配kafka 报错


外部的软件实时产生一些数据,然后使用Flume实时对这些数据进行采集,之后再利用KafkaSink将数据送到Kafka,做到一个缓存的作用,然后这些消息队列再作为SparkStreaming的数据源完成业务的计算,最后入库或者可视化。。。

接下来看看整体的实现思路:

  1. 首先先模拟APPServer实时的产生数据
  2. 然后对日志的输入进行一定的配置
    输出的级别是info,使用SYSTEM.out的方式在控制台输出,格式为Pattern所示

    输出的结果如下:
  3. Flume日志收集

streaming.conf

# Name the components on this agent
agent1.sources = avro-source
agent1.channels = logger-channel
agent1.sinks = log-sink

# Describe/configure the source
agent1.sources.avro-source.type = avro
agent1.sources.avro-source.bind = 0.0.0.0
agent1.sources.avro-source.port = 41414

# Describe the channel
agent1.channels.logger-channel.type = memory
agent1.channels.logger-channel.capacity = 1000
agent1.channels.logger-channel.transactionCapacity = 100

# Describe the sink
agent1.sinks.log-sink.type = logger

# Bind the source and sink to the channel
agent1.sources.avro-source.channels = logger-channel
agent1.sinks.log-sink.channel = logger-channel
  1. 接下来要做的是让产生的日志信息和Flume对接,来看一下Flume官方的Log4j.Appender是如何定义的

    根据官方进行一下log4j.properties的相关配置:

    在pom中添加相关的依赖
<dependency>
            <groupId>org.apache.flume.flume-ng-clients</groupId>
            <artifactId>flume-ng-log4jappender</artifactId>
            <version>1.7.0</version>
        </dependency>

启动Flume

[1@hadoop1 conf]$ flume-ng agent \
> --name agent \  上面配置的agent名字就是agent
> --conf $FLUME_HOME/conf \  系统配置的目录
> --conf-file $FLUME_HOME/conf/streaming.conf  \   系统配置的文件
> -Dflume.root.logger=INFO,console   将日志打印到控制台

到现在已经可以让产生的日志对接到Flume上了,下面要做的是将Flume采集到的数据对接到Kafka上。。。

  1. Flume对接Kafka

首先要启动后台的Kafka进程

kafka-server-start.sh 
-daemon /home/hadoop1/modules/kafka_2.11-0.11.0.2/config/server.properties

下面来创建一个topic,帮助测试案例:

[1@hadoop1 kafka_2.11-0.11.0.2]$ kafka-topics.sh 
--create 
--zookeeper hadoop1:2181 
--replication-factor 1 
--partitions 1 
--topic streaming_topic

创建一个新的flume.conf,帮助将采集到的数据对接到Kafka,首先也是先看看官方文档的介绍

flowable 没配kafka 报错_kafka_02


flume.conf的配置如下:

# Name the components on this agent
agent1.sources = avro-source
agent1.channels = logger-channel
agent1.sinks = kafka-sink

# Describe/configure the source
agent1.sources.avro-source.type = avro
agent1.sources.avro-source.bind = 0.0.0.0
agent1.sources.avro-source.port = 41414

# Describe the channel
agent1.channels.logger-channel.type = memory
agent1.channels.logger-channel.capacity = 1000
agent1.channels.logger-channel.transactionCapacity = 100

# Describe the sink
agent1.sinks.kafka-sink.type = org.apache.flume.sink.kafka.KafkaSink
agent1.sinks.kafka-sink.kafka.topic = streamingtopic
agent1.sinks.kafka-sink.kafka.bootstrap.servers = 192.168.2.161:9092
agent1.sinks.kafka-sink.kafka.flumeBatchSize = 20
agent1.sinks.kafka-sink.kafka.producer.acks = 1
agent1.sinks.kafka-sink.kafka.producer.linger.ms = 1

# Bind the source and sink to the channel
agent1.sources.avro-source.channels = logger-channel
agent1.sinks.kafka-sink.channel = logger-channel

启动Flume,然后启动Kafka消费者:

[1@hadoop1 kafka_2.11-0.11.0.2]$ kafka-console-consumer.sh 
--bootstrap-server hadoop1:9092
--topic streaming_topic

然后运行一下我们的程序即可。。。

  1. SparkStreaming处理Kafka内的数据,进行业务的计算。。
import org.apache.spark.SparkConf
import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.apache.spark.streaming.kafka.KafkaUtils

object KafkaStreamingApp {
  def main(args: Array[String]): Unit = {

    if (args.length != 4){
      System.err.println("Usage: KafkaReceiverWordCount<zkQuorum><group><topics><numThreads>")
    }

    val Array(zkQuorum, group,topics, numThreads) =args

    val sparkConf = new SparkConf().setAppName("KafkaStreamingApp").setMaster("local[*]")
    val ssc = new StreamingContext(sparkConf, Seconds(2))

    val topicMap =topics.split(",").map((_, numThreads.toInt)).toMap

    val messages = KafkaUtils.createStream(ssc, zkQuorum, group, topicMap)

    messages.map(_._2).count().print()

    ssc.start()
    ssc.awaitTermination()
  }
}

配置一下系统参数的配置

flowable 没配kafka 报错_kafka_03


启动之前的操作并运行程序即可出现我们想要的结果。。。