应用软件安装
文章目录
- 应用软件安装
- 一、Flume的安装.
- 二、修改配置
- 三、对sinkHBase程序做二次开发.
一、Flume的安装.
现有三台服务器,分别为master,slave1,slave2
将apache-flume-1.8.0-bin.tar.gz解压包上传到master根目录
再次创建文件夹:
mkdir /usr/flume
给文件赋权:
cd ~
chmod u+x apache-flume-1.8.0-bin.tar.gz
将apache-flume-1.8.0-bin.tar.gz解压到usr/flume下面:
cd ~
tar -zxvf apache-flume-1.8.0-bin.tar.gz /usr/flume
配置环境变量:
编辑
vim /etc/profile
参数
#set flume
export FLUME_HOME=/usr/flume/apache-flume-1.8.0-bin
export PATH=$PATH:$FLUME_HOME/bin
保存退出;
激活环境变量:
source /etc/profile
查看版本信息:
flume-ng version
如果报错:
没有找到错误: 找不到或无法加载主类 org.apache.flume.tools.GetJavaProperty
注释掉,在Hbase下的hbase-env.sh 中的export HBASE_CLASSPATH
二、修改配置
进入配置目录:
cd /usr/flume/apache-flume-1.8.0-bin/conf/
重命名配置文件:
mv flume-conf.properties.template flume-conf.properties
mv flume-env.sh.template flume-env.sh
修改配置文件:
vim flume-env.sh
参数:
export JAVA_HOME=/usr/java/jdk1.8.0_171
将flume分发到其他两台服务器:
scp -r /usr/flume root@slave1:/usr/
scp -r /usr/flume root@slave2:/usr/
将配置文件分发到其他两台服务器:
scp -r /etc/profile root@slave1:/etc/
scp -r /etc/profile root@slave2:/etc/
激活环境变量:
ssh slave1
source /etc/prifile
exit
ssh slave2
source /etc/prifile
exit
修改mster配置文件
进入:
cd /usr/flume/apache-flume-1.8.0-bin/conf/
修改配置:
vim flume-conf.properties
参数:
agent1.sinks = kafkaSink hbaseSink
# source配置
agent1.sources.r1.type = avro
agent1.sources.r1.channels = hbaseC kafkaC
agent1.sources.r1.bind = master
agent1.sources.r1.port = 5555
agent1.sources.r1.threads = 5
# channel配置
agent1.channels.hbaseC.type = memory
agent1.channels.hbaseC.capacity = 100000
agent1.channels.hbaseC.transactionCapacity = 100000
agent1.channels.hbaseC.keep-alive = 20
# sink配置
agent1.sinks.hbaseSink.type = asynchbase
agent1.sinks.hbaseSink.table = weblogs
agent1.sinks.hbaseSink.columnFamily = info
agent1.sinks.hbaseSink.serializer =
agent1.sinks.hbaseSink.channel = hbaseC
agent1.sinks.hbaseSink.serializer.payloadColumn = datatime,userid,searchname,retorder,cliorder,cliurl
# ************************flume + kafka***************************
# channel配置
agent1.channels.kafkaC.type = memory
agent1.channels.kafkaC.capacity = 100000
agent1.channels.kafkaC.transactionCapacity = 100000
agent1.channels.kafkaC.keep-alive = 20
# sink配置
agent1.sinks.kafkaSink.channel = kafkaC
agent1.sinks.kafkaSink.type = org.apache.flume.sink.kafka.KafkaSink
agent1.sinks.kafkaSink.topic = weblogs
agent1.sinks.kafkaSink.brokerList = master:9092,slave1:9092,slave2:9092
agent1.sinks.kafkaSink.zookeeperConnect = master:2181,slave1:2181,slave2:2181
agent1.sinks.kafkaSink.requiredAcks = 1
agent1.sinks.kafkaSink.batchSize = 1
agent1.sinks.kafkaSink.serializer.class = kafka.serializer.StringEncoder
注意master需要安装kafka才可以配置注释种的flume + kafka
修改slave1配置文件:
进入conf
cd /usr/flume/apache-flume-1.8.0-bin/conf/
修改配置文件
vim flume-conf.properties
参数:
# source配置
agent2.sources.r1.type = exec
agent2.sources.r1.command = tail -F /usr/datas/weblog-flume.log
agent2.sources.r1.channels = c1
# channel配置
./flume-caizhengjie-start.sh
ent2.channels.c1.type = memory
agent2.channels.c1.capacity = 10000
agent2.channels.c1.transactionCapacity = 10000
agent2.channels.c1.keep-alive = 5
# sink配置
agent2.sinks.k1.type = avro
agent2.sinks.k1.channel = c1
agent2.sinks.k1.hostname = master
agent2.sinks.k1.port = 5555
修改slave2:
cd /usr/flume/apache-flume-1.8.0-bin/conf/
参数:
agent3.sinks = k1
# source配置
agent3.sources.r1.type = exec
agent3.sources.r1.command = tail -F /usr/datas/weblog-flume.log
agent3.sources.r1.channels = c1
# channel配置
agent3.channels.c1.type = memory
agent3.channels.c1.capacity = 10000
agent3.channels.c1.transactionCapacity = 10000
agent3.channels.c1.keep-alive = 5
# sink配置
agent3.sinks.k1.type = avro
agent3.sinks.k1.channel = c1
agent3.sinks.k1.hostname = master
agent3.sinks.k1.port = 5555
创建datas目录:
mkdir -p datas
将准备好的数据集传到master
SogouQ.reduced.tar.gz
解压到master的datas里面
tar -zxvf SogouQ.reduced.tar.gz /usr/datas
然后执行以下命令分割数据
我们需要将数据格式转换为用逗号隔开:
那么我们需要执行两行命令
cat SogouQ.reduced |tr "\t" "," > weblog2.log
解释:将源文件进行每一行转换,把tab换为逗号,最后生成一个新文件weblog2.log
cat weblog2.log |tr " " "," > weblog.log
解释:将weblog2.log文件进行每一行转换,把空格换为逗号,最后生成一个新文件weblog.log
最后将weblog2.log文件删除即可
三、对sinkHBase程序做二次开发.
需要先下载flume的源码版本
用IDEA打开/apache-flume-1.8.0-src/flume-ng-sinks/flume-ng-hbase-sink,最后找到SimpleAsyncHbaseEventSerializer类:
@Override
public List<PutRequest> getActions() {
List<PutRequest> actions = new ArrayList<PutRequest>();
if (payloadColumn != null) {
byte[] rowKey;
try {
// payloadColumn表示的是数据列名,因为每一行数据有六个列名,并且用逗号隔开
String[] columns = new String(this.payloadColumn).split(",");
String[] values = new String(this.payload).split(",");
// 通过for循环将数据写入到actions
for (int i = 0;i<columns.length;i++){
byte[] colColumn = columns[i].getBytes();
byte[] colValue = values[i].getBytes(Charsets.UTF_8);
if(columns.length != values.length) {
break;
}
if (columns.length < 3){
break;
}
String datetime = String.valueOf(values[0]);
String userid = String.valueOf(values[1]);
rowKey = SimpleRowKeyGenerator.getKfkRowKey(userid,datetime);
PutRequest putRequest = new PutRequest(table, rowKey, cf,
colColumn, colValue);
actions.add(putRequest);
}
} catch (Exception e) {
throw new FlumeException("Could not get row key!", e);
}
}
return actions;
}
然后将源码打包
打出来之后的jar包是flume-ng-hbase-sink.jar,需要将项目中flume-ng-hbase-sink-1.8.0.jar这个包删除,然后重新命名
删除flume下面的jar包
rm -rf /usr/flume/apache-flume-1.8.0-bin/lib/flume-ng-hbase-sink-1.8.0.jar
放进
cp flume-ng-hbase-sink-1.8.0.jar /usr/flume/apache-flume-1.8.0-bin/lib/
将这个包分发到slave1,slave2
scp -r /usr/flume/apache-flume-1.8.0-bin/lib/flume-ng-hbase-sink-1.8.0.jar root@slave1:/usr/flume/apache-flume-1.8.0-bin/lib/
scp -r /usr/flume/apache-flume-1.8.0-bin/lib/flume-ng-hbase-sink-1.8.0.jar root@slave2:/usr/flume/apache-flume-1.8.0-bin/lib/
至此本小节结束.