虚拟机环境配置
##集群规划
jiqunguihua.png853×356 18.3 KB
元数据服务存储
##网卡配置
vim /etc/sysconfig/network-scripts/ifcfg-eth0
内容如下:
DEVICE=eth0
TYPE=Ethernet
#UUID=4e24d937-945e-4253-9334-7a6335a4cada
ONBOOT=yes
NM_CONTROLLED=yes
IPV6INIT=no
BOOTPROTO=static
IPADDR=10.20.8.164
GATEWAY=10.20.8.254
NETMASK=255.255.255.0
HWADDR=00:0C:29:2E:6B:C6
修改主机名
vim /etc/sysconfig/network
内容如下:
NETWORKING=yes
HOSTNAME=masterZH
NETWORKING_IPV6=no
修改hosts
vim /etc/hosts
内容如下
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
10.20.8.164 masterZH
10.20.8.165 master2ZH
10.20.8.166 slave1ZH
10.20.8.167 slave2ZH
10.20.8.168 slave3ZH
##防火墙与selinux
停止防火墙:
service iptables stop
永久关闭防火墙:
chkconfig iptables off
setenforce 0
禁用selinux:
vim /etc/selinux/config
内容如下:
SELINUX=disabled
ntp服务
ntp服务器配置:
vi /etc/ntp.conf
内容如下:
driftfile /var/lib/ntp/drift
restrict 127.0.0.1
restrict -6 ::1
restrict default nomodify notrap
server 127.127.1.0 #local clock
fudge 127.127.1.0 stratum 10
includefile /etc/ntp/crypto/pw
keys /etc/ntp/keys
启动服务: service ntpd restart 其余节点ntp服务配置:
vim /etc/ntp.conf
内容如下:
driftfile /var/lib/ntp/drift
restrict default kod nomodify notrap nopeer noquery
restrict -6 default kod nomodify notrap nopeer noquery
restrict 127.0.0.1
restrict -6 ::1
server 10.20.8.164
includefile /etc/ntp/crypto/pw
keys /etc/ntp/keys
重启与更新时间:
service ntpd restart
ntpdate -u masterZH
设置开机启动:
chkconfig ntpd on
检查是否设置成功:
chkconfig –list ntpd
自动同步时间(每小时一次):
crontab –e
添加以下内容:
0 0 * * * /usr/sbin/ntpdate -u master
ssh免密登录
集群上每台主机上打开配置:
sudo vim /etc/ssh/sshd_config
开启下面的选项:
RSAAuthentications yes
PubkeyAuthentications yes
AuthorizedKeysFile .ssh/authorized_keys
生成SSH秘钥(一路回车):
ssh-keygen -t rsa
执行完之后在~/.ssh/目录下会生成一个保存有公钥的文件:id_rsa.pub
每个机器root用户下的公钥拷贝到集群中的master机器:
ssh-copy-id root@masterZH
最终在masterZH机器上生成如下的内容的~/.ssh/authorized_keys文件:
ssh-rsa ……
ssh-rsa ……
ssh-rsa ……
masterZH的公钥追加到authorized_keys:
cat id_rsa.pub >> authorized_keys
再把master的authorized_keys拷贝到master2ZH、slave12ZH、slave2ZH和slave3ZH:
scp ~/.ssh/authorized_keys root@master2ZH:~/.ssh/
scp ~/.ssh/authorized_keys root@slave1ZH:~/.ssh/
...
重启SSH服务:
sudo service sshd restart
在masterZH上测试连接:
ssh slave1ZH
ssh master2ZH
在slave2ZH测试连接:
ssh masterZH
ssh salve3ZH
#zookeeper-3.5.1安装
下载zookeeper-3.5.1
下载地址:
https://mirrors.tuna.tsinghua.edu.cn/apache/zookeeper/ 2 选择所需版本即可
zookeeper配置
创建zookeeper存放目录:
mkdir /opt/zookeeper/
解压到该目录下:
tar -xzvf ./zookeeper-3.5.1-alpha.tar.gz -C /opt/zookeeper/
mv /opt/zookeeper/zookeeper-3.5.1-alpha /opt/zookeeper/3.5.1
修改配置文件:
cd /opt/zookeeper/3.5.1/conf
cp zoo_sample.cfg zoo.cfg
vi zoo.cfg
配置文件内容如下:
tickTime=2000
initLimit=5
syncLimit=2
dataDir=/var/lib/zookeeper
clientPort=2181
server.1=slave1ZH:2888:3888
server.2=slave2ZH:2888:3888
server.3=slave3ZH:2888:3888
创建zookeeper data存放目录:
mkdir /var/lib/zookeeper
每台zk机器创建id:
echo "1" > /var/lib/zookeeper/myid
echo "2" > /var/lib/zookeeper/myid
echo "3" > /var/lib/zookeeper/myid
zk配置为一个服务:
cd /etc/rc.d/init.d/
pwd
touch zookeeper
chmod +x zookeeper
vi zookeeper
内容如下:
#!/bin/bash
# chkconfig:2345 20 90
# description:zookeeper
# processname:zookeeper
export JAVA_HOME=/usr/local/java/jdk1.8.0_151
export PATH=$JAVA_HOME/bin:$PATH
case $1 in
start) su root /opt/zookeeper/3.5.1/bin/zkServer.sh start;;
stop) su root /opt/zookeeper/3.5.1/bin/zkServer.sh stop;;
status) su root /opt/zookeeper/3.5.1/bin/zkServer.sh status;;
restart) su root /opt/zookeeper/3.5.1/bin/zkServer.sh restart;;
*) echo "require start|stop|status|restart" ;;
esac
测试zookeeper服务的命令是否正常:
service zookeeper start
service zookeeper status
service zookeeper stop
service zookeeper status
添加到开机启动:
chkconfig zookeeper on
chkconfig --add zookeeper
至此zookeeper配置完成!
zookeeper进程与状态查看
查看进程:
查看zookeeper节点状态:
参考资料
- Installing ZooKeeper:http://docs.electric-cloud.com/commander_doc/5_0_2/HTML5/Install/Content/Install%20Guide/horizontal_scalability/9InstallZookeeper.htm
- 设置zookeeper开机启动
hadoop-2.7.2 HA 集群安装
下载hadoop-2.7.2
下载地址:
http://archive.apache.org/dist/hadoop/common/hadoop-2.7.2/ 2 下载以下三个文件:
hadoop-2.7.2-tar.gz
hadoop-2.7.2-tar.gz.asc
hadoop-2.7.2-tar.gz.mds
hadoop 配置
创建 hadoop 存放目录:
mkdir /opt/hadoop/
解压到该目录下:
tar -xzvf ./hadoop-2.7.2.tar.gz -C /opt/hadoop/
mv hadoop-2.7.2 2.7.2
配置 core-site.xml:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://ns1/</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/opt/hadoop/2.7.2/data/tmp</value>
</property>
<property>
<name>hadoop.proxyuser.hduser.hosts</name>
<value>*</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>slave1ZH:2181,slave2ZH:2181,slave3ZH:2181</value>
</property>
</configuration>
配置 hdfs-site.xml:
<configuration>
<property>
<name>dfs.nameservices</name>
<value>ns1</value>
</property>
<property>
<name>dfs.ha.namenodes.ns1</name>
<value>nn1,nn2</value>
</property>
<property>
<name>dfs.namenode.rpc-address.ns1.nn1</name>
<value>masterZH:9000</value>
</property>
<property>
<name>dfs.namenode.rpc-address.ns1.nn2</name>
<value>master2ZH:9000</value>
</property>
<property>
<name>dfs.namenode.http-address.ns1.nn1</name>
<value>masterZH:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.ns1.nn2</name>
<value>master2ZH:50070</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://slave1ZH:8485;slave2ZH:8485;slave3ZH:8485/ns1</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.ns1</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/root/.ssh/id_rsa</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/opt/hadoop/2.7.2/data/tmp/journal</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled.ns1</name>
<value>true</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/opt/hadoop/2.7.2/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/opt/hadoop/2.7.2/dfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.journalnode.http-address</name>
<value>0.0.0.0:8480</value>
</property>
<property>
<name>dfs.journalnode.rpc-address</name>
<value>0.0.0.0:8485</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>slave1ZH:2181,slave2ZH:2181,slave3ZH:2181</value>
</property>
</configuration>
配置 yarn-site.xml:
<configuration>
<property>
<name>yarn.resourcemanager.connect.retry-interval.ms</name>
<value>2000</value>
</property>
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>slave1ZH:2181,slave2ZH:2181,slave3ZH:2181</value>
</property>
<property>
<name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>masterZH</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>master2ZH</value>
</property>
<property>
<name>yarn.resourcemanager.ha.id</name>
<value>rm1</value>
</property>
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.zk-state-store.address</name>
<value>slave1ZH:2181,slave2ZH:2181,slave3ZH:2181</value>
</property>
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>slave1ZH:2181,slave2ZH:2181,slave3ZH:2181</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>yrc</value>
</property>
<property>
<name>yarn.app.mapreduce.am.scheduler.connection.wait.interval-ms</name>
<value>5000</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>masterZH:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>masterZH:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>masterZH:8031</value>
</property>
<property>
<name>yarn.resourcemanager.address.rm1</name>
<value>masterZH:8132</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address.rm1</name>
<value>masterZH:8130</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm1</name>
<value>masterZH:8188</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address.rm1</name>
<value>masterZH:8131</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address.rm1</name>
<value>masterZH:8033</value>
</property>
<property>
<name>yarn.resourcemanager.ha.admin.address.rm1</name>
<value>masterZH:23142</value>
</property>
<property>
<name>yarn.resourcemanager.address.rm2</name>
<value>master2ZH:8132</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address.rm2</name>
<value>master2ZH:8130</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm2</name>
<value>master2ZH:8188</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address.rm2</name>
<value>master2ZH:8131</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address.rm2</name>
<value>master2ZH:8033</value>
</property>
<property>
<name>yarn.resourcemanager.ha.admin.address.rm2</name>
<value>master2ZH:23142</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/opt/hadoop/2.7.2/yarn/local</value>
</property>
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>/opt/hadoop/2.7.2/logs</value>
</property>
<property>
<name>mapreduce.shuffle.port</name>
<value>23080</value>
</property>
<property>
<name>yarn.client.failover-proxy-provider</name>
<value>org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider</value>
</property>
<property>
<name>yarn.resourcemanager.ha.automatic-failover.zk-base-path</name>
<value>/yarn-leader-election</value>
</property>
<property>
<name>yarn.nodemanager.pmem-check-enabled</name>
<value>false</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
</configuration>
配置 mapred-site.xml:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>master2ZH:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>master2ZH:19888</value>
</property>
<property>
<name>yarn.app.mapreduce.am.staging-dir</name>
<value>/user</value>
</property>
<property>
<name>mapred.child.java.opts</name>
<value>-Xmx2g</value>
</property>
<property>
<name>io.sort.mb</name>
<value>512</value>
</property>
<property>
<name>io.sort.factor</name>
<value>20</value>
</property>
<property>
<name>mapred.job.reuse.jvm.num.tasks</name>
<value>-1</value>
</property>
<property>
<name>mapreduce.reduce.shuffle.parallelcopies</name>
<value>20</value>
</property>
</configuration>
配置slaves
vi /opt/hadoop/2.7.2/etc/hadoop
内容如下:
slave1ZH
slave2ZH
slave3ZH
配置 hadoop-env.sh
vi ./hadoop-env.sh
内容如下:
export JAVA_HOME=/usr/local/java/jdk1.8.0_151
export HADOOP_SSH_OPTS= "-p 22"
export HADOOP_LOG_DIR=/opt/hadoop/2.7.2/logs
配置 yarn-env.sh
vi ./yarn-env.sh
内容如下:
export JAVA_HOME=/usr/local/java/jdk1.8.0_151
export YARN_LOG_DIR=/opt/hadoop/2.7.2/logs
设置环境变量
vi /etc/profile
内容如下:
export HADOOP_HOME=/opt/hadoop/2.7.2
exportPATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
source /etc/profile
环境变量立即生效:
source /etc/profile
分发到各个节点:
在master2ZH的ha.id修改为rm2:
<property>
<name>yarn.resourcemanager.ha.id</name>
<value>rm2</value>
</property>
启动hadoop
开启zookeeper后,执行:
在slave1ZH、slave2ZH、slave3ZH上分别启动journalnode:
sbin/hadoop-daemon.sh start journalnode
格式化zk(仅安装或者重装时候执行):
在masterZH上执行:
bin/hdfs zkfc -formatZK
格式化masterZH和启动:
bin/hdfs namenode -format #仅安装或者重装时候执行
sbin/hadoop-daemon.sh start namenode
格式化master2ZH和启动:
bin/hdfs namenode -bootstrapStandby #仅安装或者重装时候执行
sbin/hadoop-daemon.sh start namenode
在masterZH与master2ZH上启动zkfc服务:
sbin/hadoop-daemon.sh start zkfc
masterZH启动datanode:
sbin/hadoop-daemons.sh start datanode
masterZH与master2ZH启动yarn与resourcemanager:
sbin/start-yarn.sh
sbin/yarn-daemon.sh start resourcemanager
master2ZH启动historyserver:
sbin/mr-jobhistory-daemon.sh start historyserver
集群主从节点的hadoop进程:
masterZH: master2ZH: slave*ZH:
hadoop命令测试与端口查看
创建并修改hdfs文件:
hadoop fs -mkdir /tmp
hadoop fs -chmod -R 777 /tmp
查看集群状态信息:
masterZH:
ha master.PNG1442×956 42.8 KB
master2ZH: 可以看到standby信息:
ha master2.PNG1440×947 42 KB
查看集群任务信息:
mysq-5.7.19安装
卸载已安装的mysql和mariadb
查询出来安装的mariadb和mariadb:
rpm -qa | grep mariadb
rpm -qa | grep mysql
卸载mariadb/mysql:
rpm -e --nodeps filename
rm /etc/my.cnf
配置安装mysql
解压安装包:
tar -xvf mysql-5.7.19-1.el6.x86_64.rpm-bundle.tar
rpm -ivh mysql-community-common-5.7.19-1.el6.x86_64.rpm
rpm -ivh mysql-community-libs-5.7.19-1.el6.x86_64.rpm
rpm -ivh mysql-community-client-5.7.19-1.el6.x86_64.rpm
rpm -ivh mysql-community-server-5.7.19-1.el6.x86_64.rpm
免密登录:
vi /etc/my.cnf
加一行
skip-grant-tables
启动:service mysqld restart 登陆:mysql 创建hive库及hive用户及权限:
meta store之一:masterZH(10.20.8.164)
set global validate_password_policy=0;
ALTER USER root@localhost IDENTIFIED BY '12345678';
create database hive_metadata DEFAULT CHARSET utf8 COLLATE utf8_general_ci;
use hive_metadata;
create user hive@10.20.8.164 IDENTIFIED by '12345678';
revoke all privileges on *.* from hive@10.20.8.164;
revoke grant option on *.* from hive@10.20.8.164;
grant all on hive_metadata.* to hive@hive_metadata IDENTIFIED by '12345678';
grant all on hive_metadata.* to hive@'%' IDENTIFIED by '12345678';
flush privileges;
meta store之二:master2ZH(10.20.8.165)
create user hive@10.20.8.165 IDENTIFIED by '12345678';
revoke all privileges on *.* from hive@10.20.8.165;
revoke grant option on *.* from hive@10.20.8.165;
grant all on hive_metadata.* to hive@hive_metadata IDENTIFIED by '12345678';
grant all on hive_metadata.* to hive@'%' IDENTIFIED by '12345678';
flush privileges;
修改hive的字符集:
alter database hive_metadata character set latin1;
quit;
hive-1.2.2安装
下载 hive-1.2.2
下载地址:
http://www-eu.apache.org/dist/hive/stable/ 2
下载以下三个文件:
apache-hive-1.2.2-bin.tar.gz
apache-hive-1.2.2-bin.tar.gz.asc
apache-hive-1.2.2-bin.tar.gz.md5
hive 配置
创建 hive 存放目录:
mkdir /opt/hive/
解压到该目录下:
tar -xzvf ./tar -zxvf apache-hive-1.2.2-bin.tar.gz -C /opt/hive/
mv apache-hive-1.2.2-bin 1.2.2
配置环境变量:
cd /etc/profile.d
touche hive-1.2.2.sh
vi hive-1.2.2.sh
配置内容如下:
# set hive environment
HIVE_HOME=/opt/hive/1.2.2
PATH=$HIVE_HOME/bin:$PATH
CLASSPATH=$CLASSPATH:$HIVE_HOME/lib
export HIVE_HOME
export PATH
export CLASSPATH
环境变量立即生效:
source /etc/profile
配置hive-env.sh:
进入配置目录:
cd /opt/hive/1.2.2/conf
复制得到hive-env.sh并打开:
cp hive-env.sh.template hive-env.sh
vi hive-env.sh
配置如下:
# set HADOOP_HOME to point specific hadoop install directory
HADOOP_HOME=/opt/hadoop/2.7.2
# hive configure directory can be controlled by:
export HIVE_CONF_DIR=/opt/hive/1.2.2/conf
配置hive log4j:
复制得到:
cp hive-exec-log4j.properties.template hive-exec-log4j.properties
cp hive-log4j.properties.template hive-log4j.properties
修改上面两个文件的配置:
hive.log.dir=/opt/hive/1.2.2/logs
log4j.appender.EventCounter=org.apache.hadoop.log.metrics.EventCounter
hive配置文件要用到hdfs的一些路径,需要手动创建:
hdfs dfs -mkdir -p /usr/hive/warehouse
hdfs dfs -mkdir -p /usr/hive/tmp
hdfs dfs -mkdir -p /usr/hive/log
hdfs dfs -chmod 777 /usr/hive/warehouse
hdfs dfs -chmod 777 /usr/hive/tmp
服务端hive-site.xml:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://10.20.8.169:3306/hive_metadata?createDatabaseIfNotExist=true&characterEncoding=UTF-8&useSSL=false</value>
<decription>the URL of the MySql database</decription>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<decription>Driver class name for a JDBC metastore</decription>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>12345678</value>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/usr/hive/warehouse</value>
</property>
<property>
<name>hive.exec.scratchdir</name>
<value>/usr/hive/tmp</value>
</property>
<property>
<name>hive.querylog.location</name>
<value>/usr/hive/log</value>
</property>
</configuration>
客户端hive-site.xml:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hive.metastore.uris</name>
<value>thrift://masterZH:9083,master2ZH:9083</value>
<decription>IP address(or fully-qualified domain name) and port of the metastore host</decription>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/usr/hive/warehouse</value>
</property>
<property>
<name>hive.exec.scratchdir</name>
<value>/usr/hive/tmp</value>
</property>
<property>
<name>hive.querylog.location</name>
<value>/usr/hive/log</value>
</property>
</configuration>
元数据库驱动包及路径
tar -zxvf ./mysql-connector-java-5.1.44.tar.gz -C /opt/hive/1.2.2/lib
cd /opt/hive/1.2.2/lib
mv mysql-connector-java-5.1.44/mysql-connector-java-5.1.44-bin.jar ./
分发配置完成的hive
分发hive目录及环境变量到其余节点(两个服务端与三个客户端,其hie-site.xml分别对应服务端配置与客户端配置) 其余节点环境变量生效:
source /etc/profile
格式化数据库:
/opt/hive/1.2.2/bin/schematool -dbType mysql -initSchema
启动服务端程序
在metastore服务端执行:
hive --service metastore&
至此安装完成!
建表测试
hdfs dfs -mkdir -p /data/test/
hdfs dfs -copyFromLocal users.dat /data/test/
users.data数据如下:
1::F::1::10::48067
2::M::56::16::70072
进入hive client
hive
交互式下输入:
show databases;
create database hivetest;
use hivetest;
create table users(UserID BigInt, Gender String, Age Int, Occuption String, Zipcode String) partitioned by (dt String) row format delimited fields terminated by '::';
load data inpath '/data/test/users.dat' into table users partition(dt=20171214);
select count(1) from users;
如下如所示:
spark安装
下载spark-1.5.1
下载地址:
http://archive.apache.org/dist/spark/spark-1.5.1/ 2
下载以下三个文件:
spark-1.5.1-bin-without-hadoop.tgz
spark-1.5.1-bin-without-hadoop.tgz.asc
spark-1.5.1-bin-without-hadoop.tgz.md5.txt
spark 配置
创建 spark 存放目录:
mkdir /opt/spark/
解压到该目录下:
tar -zxf spark-1.5.1-bin-without-hadoop.tgz -C /opt/spark/
mv spark-1.5.1-bin-without-hadoop 1.5.1
复制得到配置文件:
cd conf
cp spark-env.sh.template spark-env.sh
cp spark-defaults.conf.template spark-defaults.conf
cp log4j.properties.template log4j.properties
配置spark-env.sh:
vi conf/spark-env.sh
文件内容如下:
export HADOOP_HOME=/opt/hadoop/2.7.2
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export SPARK_DIST_CLASSPATH=${hadoop classpath}
配置slaves
vi slaves
文件内容如下:
slave1ZH
slave2ZH
slave3ZH
配置环境变量:
cd /etc/profile.d
touch spark-1.5.1.sh
vi spark-1.5.1.sh
文件内容如下:
# set spark environment
SPARK_HOME=/opt/spark/1.5.1
PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
export SPARK_HOME
export PATH
环境变量立即生效:
source /etc/profile
修改控制台日志级别:
vi log4j.properties
文件内容如下
log4j.rootCategory=WARN, console
配置history-server:
vi spark-defaults.conf
增加如下内容:
# Turns on logging for applications submitted from this machine
spark.eventLog.dir /opt/spark/1.5.1/events #也可以是hdfs路径
spark.eventLog.enabled true
# Sets the logging directory for the history server
spark.history.fs.logDirectory /opt/spark/1.5.1/events #与上面的路径保持一样
spark.history.fs.cleaner.enabled true
spark.history.fs.cleaner.interval 5d
spark.history.fs.cleaner.maxAge 15d
各节点新建日志目录:
mkdir /opt/spark/1.5.1/events
分发到各个节点:
各个节点执行:
mkdir -p /opt/spark/1.5.1
负责分发的节点执行:
scp -r ./* root@masterZH:/opt/spark/1.5.1
scp -r ./* root@slave1ZH:/opt/spark/1.5.1
...
scp /etc/profile.d/spark-1.5.1.sh root@masterZH:/etc/profile.d/
scp /etc/profile.d/spark-1.5.1.sh root@slave1ZH:/etc/profile.d/
...
各个节点执行:
source /etc/profile
启动spark
启动master:
start-master.sh
启动slaves:
start-slaves.sh
查看主从节点进程:
masterZH: master2ZH: slave*ZH:
测试spark
测试自带的程序:
执行成功即可: ./run-example SparkPi 10
查看master节点状态:
masterZH:
spark master.PNG1900×617 37.5 KB
master2ZH:
spark master2.PNG1900×527 28 KB
hbase-1.0.2安装
下载 hbase-1.0.2
下载地址:
http://archive.apache.org/dist/hbase/hbase-1.0.2/ 1 下载以下三个文件:
apache-habse-1.0.2-bin.tar.gz
apache-habse-1.0.2-bin.tar.gz.asc
apache-habse-1.0.2-bin.tar.gz.mds
hbase 配置
创建 hbase 存放目录:
mkdir /opt/hbase/
解压到该目录下:
tar -zvxf hbase-1.0.2-bin.tar.gz -C /opt/hbase/
mv hbase-1.0.2 1.0.2
配置环境变量:
cd /etc/profile.d
touch hbase-1.0.2.sh
vi hbase-1.0.2.sh
内容如下:
# set hbase environment
HBASE_HOME=/opt/hbase/1.0.2
PATH=$PATH:$HBASE_HOME/bin
export HBASE_HOME
export PATH
环境变量立即生效:
source /etc/profile
配置hbase-env.sh:
cd /opt/hbase/1.0.2/conf
vi hbase-env.sh
相应的配置项为:
export JAVA_HOME=/usr/local/java/jdk1.8.0_151
export HBASE_MANAGES_ZK=false
配置hbase-site.xml:
vi hbase-site.xml
内容如下:
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://ns1/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.master</name>
<value>60000</value>
</property>
<property>
<name>hbase.tmp.dir</name>
<value>file:///opt/hbase/1.0.2/tmp</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>slave1ZH,slave2ZH,slave3ZH</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/var/lib/zookeeper</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
</property>
<property>
<name>zookeeper.session.timeout</name>
<value>120000</value>
</property>
<property>
<name>hbase.regionserver.restart.on.zk.expire</name>
<value>true</value>
</property>
</configuration>
修改regionservers:
vi regionservers
内容如下:
slave1ZH
slave2ZH
slave3ZH
hadoop的配置文件core-site.xml、hdfs-site.xml拷贝到hbase的conf目录下
cp ${HADOOP_HOME}/etc/hadoop/hdfs-site.xml ${HBASE_HOME}/conf/
cp ${HADOOP_HOME}/etc/hadoop/core-site.xml ${HBASE_HOME}/conf/
替换lib下的hadoop-common jar包:
cp ${HADOOP_HOME}/share/hadoop/common/hadoop-common-2.7.2.jar ${HBASE_HOME}/lib/
分发到各个节点:
其余节点执行:
mkdir -p /opt/hbase/1.0.2
负责分发的节点执行:
scp -r /opt/hbase/1.0.2/ root@masterZH:/opt/hbase/
scp -r /opt/hbase/1.0.2/ root@slave1ZH:/opt/hbase/
…
scp /opt/profile.d/hbase-1.0.2.sh root@masterZH:/opt/profile.d/
scp /opt/profile.d/ hbase-1.0.2.sh root@slave1ZH:/opt/profile.d/
…
source /etc/profile
检验hbase:
在一台master执行:
start-hbase.sh
在另一台master执行:
hbase-daemon.sh start master
在其中一个zk机器上执行zkCli.sh
ls /hbase/backup-masters
可以看到备份的master信息:
[masterzh,16020,1513668117269]
更详细的信息:
http://master2zh:16010/master-status
查看hdfs上hbase目录结构
hadoop fs -ls /hbase
执行hbase shell
hbase shell
输入:
create 'member','member_id','address','info'
describe 'member'
list
exit
查看主从节点进程:
masterZH: master2ZH: slave*ZH:
查看hbase管理界面:
masterZH:
hbase master.PNG1429×963 40.8 KB
master2ZH:
hbase master2.PNG1436×961 57.2 KB
solr-5.3.1安装
solr配置
使用自带的安装脚本安装solr:
tar xvf solr-5.3.1.tgz solr-5.3.1/bin/install_solr_service.sh --strip-components=2 #解压
./install_solr_service.sh solr-5.3.1.tgz -i /opt -d /var/solr -u root -s solr -p 8983 #安装
修改solr.in.sh zookeeper配置:
vi /opt/solr-5.3.1/bin/solr.in.sh
vi /var/solr/solr.in.sh
二者中相应配置修改如下:
ZK_HOST="slave1ZH:2181,slave2ZH:2181,slave3ZH:2181"
重启服务:
service solr restart
创建测试集合:
pwd
cd /opt/solr
pwd
bin/solr create -c testcollection -d data_driven_schema_configs -s 3 -rf 2 -n myconf
查看solr log目录:
cd /opt/zookeeper/3.5.1/
bin/zkCli.sh
cd /var/solr/logs
ll
查看端口状态:
netstat -nplt | grep 8983
查看solr管理页面:
http://slave1zh:8983/solr/#/~cloud
solr collection.PNG1883×595 34.7 KB
分发到各个节点(略)
kafka安装略(超字数了) 如有问题,欢迎交流~ 本周末抽空在社区记录下大数据下关于图(graph)的一个比较实用的基础算法。