应用软件安装


文章目录

  • 应用软件安装
  • 一、Flume的安装.
  • 二、修改配置
  • 三、对sinkHBase程序做二次开发.



一、Flume的安装.

现有三台服务器,分别为master,slave1,slave2

将apache-flume-1.8.0-bin.tar.gz解压包上传到master根目录

flume 启动命令_算法

再次创建文件夹:

mkdir /usr/flume

给文件赋权:

cd ~
chmod u+x apache-flume-1.8.0-bin.tar.gz

将apache-flume-1.8.0-bin.tar.gz解压到usr/flume下面:

cd ~
tar -zxvf apache-flume-1.8.0-bin.tar.gz /usr/flume

配置环境变量:

编辑

vim /etc/profile

参数

#set flume
export FLUME_HOME=/usr/flume/apache-flume-1.8.0-bin
export PATH=$PATH:$FLUME_HOME/bin

保存退出;

激活环境变量:

source /etc/profile

查看版本信息:

flume-ng version

flume 启动命令_kafka_02


如果报错:

没有找到错误: 找不到或无法加载主类 org.apache.flume.tools.GetJavaProperty
注释掉,在Hbase下的hbase-env.sh 中的export HBASE_CLASSPATH

二、修改配置

进入配置目录:

cd /usr/flume/apache-flume-1.8.0-bin/conf/

重命名配置文件:

mv flume-conf.properties.template flume-conf.properties
mv flume-env.sh.template flume-env.sh

修改配置文件:

vim flume-env.sh

参数:

export JAVA_HOME=/usr/java/jdk1.8.0_171

flume 启动命令_大数据_03


将flume分发到其他两台服务器:

scp -r /usr/flume root@slave1:/usr/
 scp -r /usr/flume root@slave2:/usr/

将配置文件分发到其他两台服务器:

scp -r /etc/profile root@slave1:/etc/
 scp -r /etc/profile root@slave2:/etc/

激活环境变量:

ssh slave1
source /etc/prifile
exit

ssh slave2
source /etc/prifile
exit

修改mster配置文件
进入:

cd /usr/flume/apache-flume-1.8.0-bin/conf/

修改配置:

vim flume-conf.properties

参数:

agent1.sinks = kafkaSink hbaseSink

# source配置
agent1.sources.r1.type = avro
agent1.sources.r1.channels = hbaseC kafkaC
agent1.sources.r1.bind = master
agent1.sources.r1.port = 5555
agent1.sources.r1.threads = 5

# channel配置
agent1.channels.hbaseC.type = memory
agent1.channels.hbaseC.capacity = 100000
agent1.channels.hbaseC.transactionCapacity = 100000
agent1.channels.hbaseC.keep-alive = 20

# sink配置
agent1.sinks.hbaseSink.type = asynchbase
agent1.sinks.hbaseSink.table = weblogs
agent1.sinks.hbaseSink.columnFamily = info
agent1.sinks.hbaseSink.serializer =
agent1.sinks.hbaseSink.channel = hbaseC
agent1.sinks.hbaseSink.serializer.payloadColumn = datatime,userid,searchname,retorder,cliorder,cliurl

# ************************flume + kafka***************************

# channel配置
agent1.channels.kafkaC.type = memory
agent1.channels.kafkaC.capacity = 100000
agent1.channels.kafkaC.transactionCapacity = 100000
agent1.channels.kafkaC.keep-alive = 20

# sink配置
agent1.sinks.kafkaSink.channel = kafkaC
agent1.sinks.kafkaSink.type = org.apache.flume.sink.kafka.KafkaSink
agent1.sinks.kafkaSink.topic = weblogs
agent1.sinks.kafkaSink.brokerList = master:9092,slave1:9092,slave2:9092
agent1.sinks.kafkaSink.zookeeperConnect = master:2181,slave1:2181,slave2:2181
agent1.sinks.kafkaSink.requiredAcks = 1
agent1.sinks.kafkaSink.batchSize = 1
agent1.sinks.kafkaSink.serializer.class = kafka.serializer.StringEncoder

注意master需要安装kafka才可以配置注释种的flume + kafka

修改slave1配置文件:
进入conf

cd /usr/flume/apache-flume-1.8.0-bin/conf/

修改配置文件

vim flume-conf.properties

参数:

# source配置
agent2.sources.r1.type = exec
agent2.sources.r1.command = tail -F /usr/datas/weblog-flume.log
agent2.sources.r1.channels = c1

# channel配置
./flume-caizhengjie-start.sh
ent2.channels.c1.type = memory
agent2.channels.c1.capacity = 10000
agent2.channels.c1.transactionCapacity = 10000
agent2.channels.c1.keep-alive = 5

# sink配置
agent2.sinks.k1.type = avro
agent2.sinks.k1.channel = c1
agent2.sinks.k1.hostname = master
agent2.sinks.k1.port = 5555

修改slave2:

cd /usr/flume/apache-flume-1.8.0-bin/conf/

参数:

agent3.sinks = k1

# source配置
agent3.sources.r1.type = exec
agent3.sources.r1.command = tail -F /usr/datas/weblog-flume.log
agent3.sources.r1.channels = c1

# channel配置
agent3.channels.c1.type = memory
agent3.channels.c1.capacity = 10000
agent3.channels.c1.transactionCapacity = 10000
agent3.channels.c1.keep-alive = 5

# sink配置
agent3.sinks.k1.type = avro
agent3.sinks.k1.channel = c1
agent3.sinks.k1.hostname = master
agent3.sinks.k1.port = 5555

创建datas目录:

mkdir -p datas

flume 启动命令_flume 启动命令_04


将准备好的数据集传到master

SogouQ.reduced.tar.gz

flume 启动命令_大数据_05


解压到master的datas里面

tar -zxvf  SogouQ.reduced.tar.gz /usr/datas

然后执行以下命令分割数据

我们需要将数据格式转换为用逗号隔开:
那么我们需要执行两行命令

cat SogouQ.reduced |tr "\t" "," > weblog2.log

解释:将源文件进行每一行转换,把tab换为逗号,最后生成一个新文件weblog2.log

cat weblog2.log |tr " " "," > weblog.log

解释:将weblog2.log文件进行每一行转换,把空格换为逗号,最后生成一个新文件weblog.log
最后将weblog2.log文件删除即可

flume 启动命令_算法_06

三、对sinkHBase程序做二次开发.

需要先下载flume的源码版本

用IDEA打开/apache-flume-1.8.0-src/flume-ng-sinks/flume-ng-hbase-sink,最后找到SimpleAsyncHbaseEventSerializer类:

flume 启动命令_大数据_07

@Override
  public List<PutRequest> getActions() {
    List<PutRequest> actions = new ArrayList<PutRequest>();
    if (payloadColumn != null) {
      byte[] rowKey;
      try {

        // payloadColumn表示的是数据列名,因为每一行数据有六个列名,并且用逗号隔开
        String[] columns = new String(this.payloadColumn).split(",");
        String[] values = new String(this.payload).split(",");

        // 通过for循环将数据写入到actions
        for (int i = 0;i<columns.length;i++){
          byte[] colColumn = columns[i].getBytes();
          byte[] colValue = values[i].getBytes(Charsets.UTF_8);

          if(columns.length != values.length) {
            break;
          }
          if (columns.length < 3){
            break;
          }

          String datetime = String.valueOf(values[0]);
          String userid = String.valueOf(values[1]);
          rowKey = SimpleRowKeyGenerator.getKfkRowKey(userid,datetime);
          PutRequest putRequest =  new PutRequest(table, rowKey, cf,
                  colColumn, colValue);
          actions.add(putRequest);
        }

      } catch (Exception e) {
        throw new FlumeException("Could not get row key!", e);
      }
    }
    return actions;
  }

然后将源码打包

flume 启动命令_flume 启动命令_08

打出来之后的jar包是flume-ng-hbase-sink.jar,需要将项目中flume-ng-hbase-sink-1.8.0.jar这个包删除,然后重新命名

删除flume下面的jar包

rm -rf  /usr/flume/apache-flume-1.8.0-bin/lib/flume-ng-hbase-sink-1.8.0.jar

放进

cp flume-ng-hbase-sink-1.8.0.jar /usr/flume/apache-flume-1.8.0-bin/lib/

flume 启动命令_大数据_09


将这个包分发到slave1,slave2

scp -r /usr/flume/apache-flume-1.8.0-bin/lib/flume-ng-hbase-sink-1.8.0.jar root@slave1:/usr/flume/apache-flume-1.8.0-bin/lib/

 scp -r /usr/flume/apache-flume-1.8.0-bin/lib/flume-ng-hbase-sink-1.8.0.jar root@slave2:/usr/flume/apache-flume-1.8.0-bin/lib/

至此本小节结束.