目录:(1)/usr/local/hadoop/(2)hadoop配置目录:/usr/local/hadoop/etc/hadoop(3)/usr/local/hadoop/sbin(4)share目录在/usr/local/hadoop下(5)将HDFS所有文件副本数调整为2,数据块大小为512MB:在/usr/local/hadoop/etc/hadoop/hdfs-site.xml文件中修改(6)将指定HDFS上某个文件(比如/home/hadoop/test.txt)的副本数调整成2:hadoop dfs -setrep -w 2 -R /home/hadoop/test.txt
命令:(1)①重新编译环境变量使配置生效,source /etc/profile
②开启namenode:hadoop-daemon.sh start namenode 关闭namenode:hadoop-daemon.sh stop namenode
③开启datanode:hadoop-daemon.sh start datanode 关闭datanode:hadoop-daemon.sh stop datanode
④开启hdfs:start-dfs.sh 关闭hdfs:stop-dfs.sh
⑤开启yarn:start-yarn.sh 关闭yarn:stop-yarn.sh
(2)hadoop完全分布式 HA高可用搭建过程:
1、使用hadoop用户解压并安装到apps路径下
1.1使用hadoop用户进入到在/home/hadoop/apps目录下
cd /home/hadoop/apps
注意:如果没有/home/hadoop/apps路径,自行在/home/hadoop路径下创建apps文件夹:mkdir /home/Hadoop/apps
1.2使用rz将本机的hadoop安装包上传到/home/hadoop/apps目录下
1.3解压安装文件
tar -zxvf hadoop-2.7.4.tar.gz
1.4使用root用户创建软链接
ln -s /home/hadoop/apps/hadoop-2.7.4 /usr/local/hadoop
1.5使用root用户修改软链接属主
chown -R hadoop:hadoop /usr/local/hadoop
1.6使用root用户将hadoop相关内容添加到环境变量中
注意:Hadoop配置文件路径是/usr/local/hadoop/etc/hadoop
vim /etc/profile
添加内容如下:
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_CONF_DIR=HADOOP_HOME
export YARN_CONF_DIR=PATH:{HADOOP_HOME}/sbin
1.7使用root用户重新编译环境变量使配置生效
source /etc/profile
2. 配置HDFS
2.1使用hadoop用户进入到Hadoop配置文件路径
cd /usr/local/hadoop/etc/hadoop
2.2修改hadoop-env.sh
修改JDK路径export JAVA_HOME=/usr/local/jdk
2.3 配置core-site.xml(具体配置文件在最下面)
2.4 配置hdfs-site.xml
3. 配置YARN
3.1 修改yarn-site.xml
3.2 修改mapred-site.xml**(在配置mapred-site.xml时,千万注意文件名字一定是mapred-site.xml,不是mapred-site.xml.template,要用命令:cp mapred-site.xml.template mapred-site.xml复制一份然后在配置)**
3.3 在/usr/local/hadoop路径下创建hdpdata文件夹
cd /usr/local/hadoop
mkdir hdpdata
4. 修改slaves文件,设置datanode和nodemanager启动节点主机名称
在slaves文件中添加节点的主机名称
node03
node04
node05
注意:node03,node04,node05是我的虚拟机主机名称,在这三台机器上启动datanode和nodemanager,同学根据自己集群主机名称情况自行修改。
5. 配置hadoop用户免密码登陆
配置node01到node01、node02、node03、node04、node05的免密码登陆
在node01上生产一对钥匙
ssh-keygen -t rsa
将公钥拷贝到其他节点,包括自己本机
ssh-copy-id -i node01
ssh-copy-id -i node02
ssh-copy-id -i node03
ssh-copy-id -i node04
ssh-copy-id -i node05
配置node02到node01、node02、node03、node04、node05的免密码登陆
在node02上生产一对钥匙
ssh-keygen -t rsa
将公钥拷贝到其他节点,包括自己本机
ssh-copy-id -i node01
ssh-copy-id -i node02
ssh-copy-id -i node03
ssh-copy-id -i node04
ssh-copy-id -i node05
注意:两个namenode之间要配置ssh免密码登陆
6. 将配置好的hadoop拷贝到其他节点
scp -r hadoop-2.7.4 hadoop@node02:/home/hadoop/apps
scp -r hadoop-2.7.4 hadoop@node03:/home/hadoop/apps
scp -r hadoop-2.7.4 hadoop@node04:/home/hadoop/apps
scp -r hadoop-2.7.4 hadoop@node05:/home/hadoop/apps
在每个节点分别执行如下四步操作
第一步:使用root用户创建软链接
ln -s /home/hadoop/apps/hadoop-2.7.4 /usr/local/hadoop
第二步:使用root用户修改软链接属主
chown -R hadoop:hadoop /usr/local/hadoop
第三步:使用root用户添加环境变量
vim /etc/profile
添加内容:
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_CONF_DIR=HADOOP_HOME
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
export PATH={HADOOP_HOME}/bin:${HADOOP_HOME}/sbin
第四步:使用root用户重新编译环境变量使配置生效
source /etc/profile
(3)集群规划
(4)启动hadoop集群命令(注意使用hadoop用户启动,严格按照顺序启动):
①启动journalnode,在node03,node04,node05机器上,输入/usr/local/hadoop/sbin/hadoop-daemon.sh start journalnode 然后输入jps命令检验多了journalnode进程
②格式化HDFS,在node01上执行命令:hdfs namenode -format,格式化成功之后会在core-site.xml中的hadoop.tmp.dir制定的路径下生成dfs文件夹,将该文件夹拷贝到node02的相同路径下,scp -r hdpdata hadoop@node02:/usr/local/hadoop
③在node01上执行格式化ZKFC操作:hdfs zkfc -formatZK,执行成功后,日志输出如下信息:INFO ha.ActiveStandbyElector: Successfully created/hadoop-ha/ns in ZK
④在node01上启动HDFS:sbin/start-dfs.sh
⑤在node02上启动YARN:sbin/start-yarn.sh,在node01单独启动一个ResourceManager作为备份节点ResourceManager(Standby):sbin/yarn-daemon.sh start resourcemanager
⑥在node02上启动JobHistoryServer:sbin/mr-jobhistory-daemon.sh start historyserver,启动完成node02会增加一个JobHistoryServer进程
⑦hadoop安装启动完成
HDFS HTTP访问地址:
NameNode (active):http://192.168.183.100:50070
NameNode (standby):http://192.168.183.101:50070
ResourceManager HTTP访问地址:
ResourceManager :http://192.168.183.101:8088
历史日志HTTP访问地址:
JobHistoryServer:http://192.168.183.101:19888
(5)集群验证
①验证HDFS是否正常工作及HA高可用
首先向hdfs上传一个文件:hadoop fs -put /usr/local/hadoop/README.txt /
在active节点手动关闭active的namenode:sbin/hadoop-daemon.sh stop namenode
通过HTTP 50070端口查看standby namenode的状态是否转换为active,手动启动上一步关闭的namenode:sbin/hadoop-daemon.sh start namenode
②验证YARN是否正常工作及ResourceManager HA高可用
运行测试hadoop提供的demo中的WordCount程序:
hadoop fs -mkdir /wordcount
hadoop fs -mkdir /wordcount/input
hadoop fs -mv /README.txt /wordcount/input
hadoop jar share/hadoop/mapreduce/hadoop-mapreduce/hadoop-mapreduce-examples-2.7.4.jar wordcount /wordcount/input /wordcount/output。share目录在/usr/local/hadoop下
验证ResourceManager HA
手动关闭node02的ResourceManager:sbin/hadoop-daemon.sh stop resourcemanager
通过HTTP 8088端口访问node01的ResourceManager查看状态
手动启动node02的ResourceManager:sbin/yarn-daemon.sh start resourcemanager
配置文件(1)core-site.xml
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an “AS IS” BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
–>
fs.defaultFS hdfs://ns
hadoop.tmp.dir /usr/local/hadoop/hdpdata/ 需要手动创建hdpdata目录
ha.zookeeper.quorum node01:2181,node02:2181,node03:2181 zookeeper地址,多个用逗号隔开
(2)hdfs-site.xml
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an “AS IS” BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
–>
dfs.nameservices ns 指定hdfs的nameservice为ns,需要和core-site.xml中的保持一致 dfs.ha.namenodes.ns nn1,nn2 ns命名空间下有两个NameNode,逻辑代号,随便起名字,分别是nn1,nn2 dfs.namenode.rpc-address.ns.nn1 node01:9000 nn1的RPC通信地址 dfs.namenode.http-address.ns.nn1 node01:50070 nn1的http通信地址 dfs.namenode.rpc-address.ns.nn2 node02:9000 nn2的RPC通信地址 dfs.namenode.http-address.ns.nn2 node02:50070 nn2的http通信地址
dfs.namenode.shared.edits.dir qjournal://node03:8485;node04:8485;node05:8485/ns 指定NameNode的edits元数据在JournalNode上的存放位置 dfs.journalnode.edits.dir /usr/local/hadoop/journaldata 指定JournalNode在本地磁盘存放数据的位置
dfs.ha.automatic-failover.enabled true 开启NameNode失败自动切换 dfs.client.failover.proxy.provider.ns org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider 配置失败自动切换实现方式,使用内置的zkfc dfs.ha.fencing.methods sshfence shell(/bin/true) 配置隔离机制,多个机制用换行分割,先执行sshfence,执行失败后执行shell(/bin/true),/bin/true会直接返回0表示成功 dfs.ha.fencing.ssh.private-key-files /home/hadoop/.ssh/id\_rsa 使用sshfence隔离机制时需要ssh免登陆 dfs.ha.fencing.ssh.connect-timeout 30000 配置sshfence隔离机制超时时间
dfs.replication 3 设置block副本数为3 dfs.block.size 134217728 设置block大小是128M
(3)mapred-site.xml
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an “AS IS” BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
–>
mapreduce.framework.name yarn 指定mr框架为yarn方式
mapreduce.jobhistory.address node02:10020 历史服务器端口号 mapreduce.jobhistory.webapp.address node02:19888 历史服务器的WEB UI端口号 mapreduce.jobhistory.joblist.cache.size 2000 内存中缓存的historyfile文件信息(主要是job对应的文件目录)
(4)yarn-site.xml
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an “AS IS” BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
–>