三台CentOS7的机器分别为cdh50-121,cdh50-122,cdh50-127,用户名都是hdfs,并且设置免密互信

一、安装前准备

  1. 下载地址https://archive.apache.org/dist/hadoop/common/hadoop-3.1.3/,下载hadoop-3.1.3.tar.gz
  2. 为方便操作,zookeeper安装在用户目录下(/home/hdfs)后续操作都以hdfs用户操作
  3. 压缩包放置到用户目录,并且解压压缩包,解压后文件夹名修改为hadoop
cd ~
tar zxf hadoop-3.1.3.tar.gz
mv hadoop-3.1.3 hadoop
  1. 注意三台机器配置host,并配置三台机器免密互信
  2. 安装jdk,省略……

二、配置环境变量

修改用户目录下的隐藏文件.bashrc,执行 vi ~/.bashrc 在末尾处添加

export JAVA_HOME=/usr/java/jdk1.8.0_131
export JRE_HOME=$JAVA_HOME/jre
export CLASSPATH=.:$CLASSPATH:$JAVA_HOME/lib:$JRE_HOME/lib
export HADOOP_HOME=/home/hdfs/hadoop
export HADOOP_HDFS_HOME=$HADOOP_HOME
export PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

使环境变量生效,执行 source ~/.bashrc

三、环境配置

配置文件路径为 /home/hdfs/hadoop/etc/hadoop/

  1. hadoop-env.sh
export JAVA_HOME=${JAVA_HOME}
export HADOOP_HOME=${HADOOP_HOME}
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true"
export HADOOP_OS_TYPE=${HADOOP_OS_TYPE:-$(uname -s)}
export HADOOP_CLIENT_OPTS="-Xmx4096m"
export HADOOP_SSH_OPTS="-o BatchMode=yes -o StrictHostKeyChecking=no -o ConnectTimeout=10s"
export HADOOP_LOG_DIR=${HADOOP_HOME}/logs
export HADOOP_PID_DIR=${HADOOP_HOME}/tmp
export HADOOP_SECURE_PID_DIR=${HADOOP_PID_DIR}
export HADOOP_SECURE_LOG=${HADOOP_LOG_DIR}
export HDFS_PORTMAP_OPTS="-Xmx4096m"
export HDFS_NAMENODE_USER=hdfs
  1. core-site.xml
<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://ns1</value>
    </property>
    <property>
        <name>io.file.buffer.size</name>
        <value>131072</value>
    </property>

    <!--配置回收站回收时间(1440分钟=一天)
        fs.trash.interval 设置垃圾回收的保存时间(分钟) 4320=3天
        fs.trash.checkpoint.interval 设置检查回收站的垃圾文件的时间间隔,该值应该小于等于上面的时间
    -->
    <property>
        <name>fs.trash.interval</name>
        <value>4320</value>
    </property>
    <property>
        <name>fs.trash.checkpoint.interval</name>
        <value>4320</value>
    </property>
    <!--指定hadoop临时目录-->
    <property>
        <name>hadoop.tmp.dir</name>
        <value>file:/home/hdfs/hadoop/tmp</value>
    </property>

    <property>
        <name>hadoop.proxyuser.hadoop.hosts</name>
        <value>*</value>
    </property>
    <property>
        <name>hadoop.proxyuser.hadoop.groups</name>
        <value>*</value>
    </property>

    <property>
        <name>hadoop.native.lib</name>
        <value>true</value>
    </property>

    <!-- hadoop权限验证 -->
    <property>
        <name>hadoop.security.authorization</name>
        <value>true</value>
    </property>
    <property>
        <name>hadoop.security.authentication</name>
        <value>simple</value>
    </property>
    <!--添加从网页访问数据的用户,默认是dr.who-->
    <property>
        <name>hadoop.http.staticuser.user</name>
        <value>hdfs</value>
    </property>

    <!-- 指定zookeeper地址 -->
    <property>
        <name>ha.zookeeper.quorum</name>
        <value>cdh50-121:2181,cdh50-122:2181,cdh50-127:2181</value>
    </property>
</configuration>
  1. hdfs-site.xml
<configuration>
    <!-- HA模式 -->
    <!-- 指定hdfs元数据存储路径 -->
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>file:/home/hdfs/hadoop/dfs/name</value>
    </property>
    <!-- 指定hdfs数据存储的路径 -->
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>/home/hdfs/hadoop/dfs/data</value>
    </property>

    <!-- 数据备份的个数 -->
    <property>
        <name>dfs.replication</name>
        <value>3</value>
    </property>

    <!-- 开启WebHDFS功能(基于REST的接口服务) -->
    <property>
        <name>dfs.webhdfs.enabled</name>
        <value>true</value>
    </property>

    <property>
        <name>dfs.permissions.superusergroup</name>
        <value>hdfs</value>
    </property>

    <!-- 关闭权限验证 -->
    <property>
        <name>dfs.permissions.enabled</name>
        <value>false</value>
    </property>

    <!-- //以下为HDFS HA的配置// -->
    <!--指定hdfs的nameservice为ns1 ,需要和core-site.xml中的保持一致 -->
    <property>
        <name>dfs.nameservices</name>
        <value>ns1</value>
    </property>

    <!-- ns1下面有两个NameNode,分别是nn1,nn2 -->
    <property>
        <name>dfs.ha.namenodes.ns1</name>
        <value>nn1,nn2</value>
    </property>
    <!-- nn1的RPC通信地址 -->
    <property>
        <name>dfs.namenode.rpc-address.ns1.nn1</name>
        <value>cdh50-121:8020</value>
    </property>
    <!-- nn2的RPC通信地址 -->
    <property>
        <name>dfs.namenode.rpc-address.ns1.nn2</name>
        <value>cdh50-122:8020</value>
    </property>

    <!-- nn1的http通信地址 -->
    <property>
        <name>dfs.namenode.http-address.ns1.nn1</name>
        <value>cdh50-121:9870</value>
    </property>
    <!-- nn2的http通信地址 -->
    <property>
        <name>dfs.namenode.http-address.ns1.nn2</name>
        <value>cdh50-122:9870</value>
    </property>

    <!--
        指定NameNode的元数据在JournalNode上的存放位置
        运行的JournalNode进程非常轻量,可以部署在其他的服务器上。
        注意:必须允许至少3个节点。当然可以运行更多,但是必须是奇数个,如3、5、7、9个等等。
        当运行N个节点时,系统可以容忍至少(N-1)/2(N至少为3)个节点失败而不影响正常运行。
    -->
    <property>
        <name>dfs.namenode.shared.edits.dir</name>
        <value>qjournal://cdh50-121:8485;cdh50-122:8485;cdh50-127:8485/ns1</value>
    </property>

    <!-- 指定JournalNode日志文件在本地磁盘存放数据的位置 -->
    <property>
        <name>dfs.journalnode.edits.dir</name>
        <value>/home/hdfs/hadoop/journaldata</value>
    </property>

    <!-- 开启NameNode失败自动切换 -->
    <property>
        <name>dfs.ha.automatic-failover.enabled</name>
        <value>true</value>
    </property>

    <!-- 配置失败自动切换实现方式 -->
    <property>
        <name>dfs.client.failover.proxy.provider.ns1</name>
        <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>

    <!--
     配置隔离机制方法,多个机制用换行分割,即每个机制暂用一行
     当配置了HDFS HA集群时,会有两个NameNode,为了避免两个NN都为Active状态(这种情况称为split-brain scenario),
     当发生failover时,Standby的节点要执行一系列方法把原来那个Active节点中不健康的NameNode服务给杀掉(这个过程就称为fence)。
     而下面这个配置就是配置了执行杀死原来Active NameNode服务的方法。
     这里面配置的所有方法都会被顺序的执行,最后返回结果即为fence过程的结果。
     如果fence执行成功,就把原来为Standby的NameNode的状态提升为Active。
     sshfence方法会通过ssh远程调用fuser命令去找到NameNode服务并杀死它。
     我们的目标是当发生failover时,不论如何,就算前面的sshfence执行失败(比如服务器上不存在fuser命令),
     依然把Standby节点的状态提升为Active。
     所以最后无论如何要配置一个shell(/bin/true),保证不论前面的方法执行的情况如何,最后fence过程返回的结果都为true。
     dfs.ha.fencing.ssh.private-key-files配置了ssh命令所需要用到的私钥。
     -->
    <property>
        <name>dfs.ha.fencing.methods</name>
        <value>
            sshfence
            shell(/bin/true)
        </value>
    </property>
    <!-- 使用sshfence隔离机制时需要ssh免登陆,指定密钥的位置 -->
    <property>
        <name>dfs.ha.fencing.ssh.private-key-files</name>
        <value>/home/hdfs/.ssh/id_rsa</value>
    </property>

    <!-- 配置sshfence隔离机制超时时间 -->
    <property>
        <name>dfs.ha.fencing.ssh.connect-timeout</name>
        <value>30000</value>
    </property>

    <!-- 磁盘均衡参数添加 -->
</configuration>
  1. capacity-scheduler.xml
<configuration>

  <property>
    <name>yarn.scheduler.capacity.maximum-applications</name>
    <value>10000</value>
    <description>
      Maximum number of applications that can be pending and running.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.maximum-am-resource-percent</name>
    <value>0.2</value>
    <description>
	设置有多少资源可以用来运行app master,即控制当前激活状态的应用。默认是10%
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.resource-calculator</name>
    <value>org.apache.hadoop.yarn.util.resource.DominantResourceCalculator</value>
    <description>
	  资源计算方法:
	  DefaultResourseCalculator只会计算内存
	  DominantResourceCalculator 考虑内存和CPU
    </description>
  </property>

  <!--队列设置 -->
  <property>
    <name>yarn.scheduler.capacity.root.queues</name>
    <value>prod,dev,default</value>
  </property>

  <!-- 队列百分比-->
  <property>
    <name>yarn.scheduler.capacity.root.prod.capacity</name>
    <value>60</value>
  </property>
  <property>
    <name>yarn.scheduler.capacity.root.dev.capacity</name>
    <value>30</value>
  </property>
  <property>
    <name>yarn.scheduler.capacity.root.default.capacity</name>
    <value>10</value>
  </property>

  <property>
    <name>yarn.scheduler.capacity.node-locality-delay</name>
    <value>40</value>
    <description>
      Number of missed scheduling opportunities after which the CapacityScheduler 
      attempts to schedule rack-local containers.
      When setting this parameter, the size of the cluster should be taken into account.
      We use 40 as the default value, which is approximately the number of nodes in one rack.
      Note, if this value is -1, the locality constraint in the container request
      will be ignored, which disables the delay scheduling.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.rack-locality-additional-delay</name>
    <value>-1</value>
    <description>
      Number of additional missed scheduling opportunities over the node-locality-delay
      ones, after which the CapacityScheduler attempts to schedule off-switch containers,
      instead of rack-local ones.
      Example: with node-locality-delay=40 and rack-locality-delay=20, the scheduler will
      attempt rack-local assignments after 40 missed opportunities, and off-switch assignments
      after 40+20=60 missed opportunities.
      When setting this parameter, the size of the cluster should be taken into account.
      We use -1 as the default value, which disables this feature. In this case, the number
      of missed opportunities for assigning off-switch containers is calculated based on
      the number of containers and unique locations specified in the resource request,
      as well as the size of the cluster.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.queue-mappings</name>
    <value></value>
    <description>
      A list of mappings that will be used to assign jobs to queues
      The syntax for this list is [u|g]:[name]:[queue_name][,next mapping]*
      Typically this list will be used to map users to queues,
      for example, u:%user:%user maps all users to queues with the same name
      as the user.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.queue-mappings-override.enable</name>
    <value>false</value>
    <description>
      If a queue mapping is present, will it override the value specified
      by the user? This can be used by administrators to place jobs in queues
      that are different than the one specified by the user.
      The default is false.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.per-node-heartbeat.maximum-offswitch-assignments</name>
    <value>1</value>
    <description>
      Controls the number of OFF_SWITCH assignments allowed
      during a node's heartbeat. Increasing this value can improve
      scheduling rate for OFF_SWITCH containers. Lower values reduce
      "clumping" of applications on particular nodes. The default is 1.
      Legal values are 1-MAX_INT. This config is refreshable.
    </description>
  </property>


  <property>
    <name>yarn.scheduler.capacity.application.fail-fast</name>
    <value>false</value>
    <description>
      Whether RM should fail during recovery if previous applications'
      queue is no longer valid.
    </description>
  </property>

  <!-- 提交权限控制 暂时关闭-->
  <!-- 权限是从下到上查找是否有提交权限的,所以根(root)节点必须设置为空,表示拒绝一切任务提交 
  <property>
	  <name>yarn.scheduler.capacity.root.acl_submit_applications</name>  
	  <value> </value>
	  <description></description>
  </property> 
  <property>
      <name>yarn.scheduler.capacity.root.acl_administer_queue</name>  
	  <value> </value>
  </property>-->

</configuration>
  1. mapred-site.xml
<configuration>
    <!-- 指定MapReduce计算框架使用YARN -->
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>

    <!-- 指定jobhistory server的rpc地址
	   jobhistory指定任一台历史服务器即可,不需要多台,因为历史记录保存在hdfs上
	-->
    <property>
        <name>mapreduce.jobhistory.address</name>
        <value>cdh50-121:10020</value>
    </property>
    <!-- 指定jobhistory server的http地址 -->
    <property>
        <name>mapreduce.jobhistory.webapp.address</name>
        <value>cdh50-121:19888</value>
    </property>

    <!-- 开启uber模式(针对小作业的优化)
    <property>
    	<name>mapreduce.job.ubertask.enable</name>
		<value>false</value>
		<description>是否启用uber</description>
    </property>-->

    <!-- 配置启动uber模式的最大map数
    <property>
        <name>mapreduce.job.ubertask.maxmaps</name>
        <value>9</value>
        <description>最大map数,默认值为9</description>
    </property>-->

    <!-- 配置启动uber模式的最大reduce数
    <property>
        <name>mapreduce.job.ubertask.maxreduces</name>
        <value>1</value>
        <description>最大reduce数,默认值为1</description>
    </property>-->

    <!--
    <property>
        <name>mapreduce.job.ubertask.maxbytes</name>
        <value></value>
        <description>最大输入字节数,默认为blocksize,即128MB</description>
    </property>
    -->

    <property>
        <name>mapreduce.jobtracker.http.address</name>
        <value>0.0.0.0:50030</value>
    </property>

    <!-- 保存在hdfs上的job history目录 -->
    <property>
        <name>mapreduce.jobhistory.done-dir</name>
        <value>/jobhistory/done</value>
    </property>
    <property>
        <name>mapreduce.intermediate-done-dir</name>
        <value>/jobhisotry/done_intermediate</value>
    </property>

    <!--  map 内存配置 -->
    <property>
        <name>mapreduce.map.memory.mb</name>
        <value>4096</value>
    </property>
    <property>
        <name>mapreduce.reduce.memory.mb</name>
        <value>8192</value>
    </property>
    <property>
        <name>mapreduce.map.java.opts</name>
        <value>-Xmx3072M</value>
        <description>该值不能超过mapreduce.map.memory.mb</description>
    </property>
    <property>
        <name>mapreduce.reduce.java.opts</name>
        <value>-Xmx6144M</value>
        <description>该值不能超过mapreduce.reduce.memory.mb</description>
    </property>
    <property>
        <name>mapreduce.task.io.sort.mb</name>
        <value>1024</value>
    </property>
    <property>
        <name>mapreduce.task.io.sort.factor</name>
        <value>100</value>
    </property>
    <property>
        <name>mapreduce.reduce.shuffle.parallelcopies</name>
        <value>50</value>
    </property>

    <!-- 增加作业提交时可以指定的队列的名称 -->
    <property>
        <name>mapred.acls.enabled</name>
        <value>true</value>
    </property>
    <property>
        <name>mapred.queue.names</name>
        <value>default</value>
    </property>
    <property>
        <name>mapreduce.application.classpath</name>
        <value>
            $HADOOP_HOME/etc/hadoop,
            $HADOOP_HOME/share/hadoop/common/*,
            $HADOOP_HOME/share/hadoop/common/lib/*,
            $HADOOP_HOME/share/hadoop/hdfs/*,
            $HADOOP_HOME/share/hadoop/hdfs/lib/*,
            $HADOOP_HOME/share/hadoop/mapreduce/*,
            $HADOOP_HOME/share/hadoop/mapreduce/lib/*,
            $HADOOP_HOME/share/hadoop/yarn/*,
            $HADOOP_HOME/share/hadoop/yarn/lib/*
        </value>
    </property>
</configuration>
  1. yarn-site.xml
<configuration>
    <!-- Site specific YARN configuration properties -->
    <!-- 开启RM高可用 -->
    <property>
        <name>yarn.resourcemanager.ha.enabled</name>
        <value>true</value>
    </property>
    <!-- 指定RM的cluster id -->
    <property>
        <name>yarn.resourcemanager.cluster-id</name>
        <value>yarn-na</value>
    </property>
    <!-- 指定RM的名字 -->
    <property>
        <name>yarn.resourcemanager.ha.rm-ids</name>
        <value>rm1,rm2</value>
    </property>

    <!-- ns节点名称命名
    <property>
    	<name>yarn.resourcemanager.ha.id</name>
	<value>rm1</value>
	<description>每个rm节点分别配置。例如: rm1上配置为rm1, rm2上配置rm2</description>
    </property>
    -->

    <!--开启自动恢复功能-->
    <property>
        <name>yarn.resourcemanager.recovery.enabled</name>
        <value>true</value>
    </property>

    <!-- 开启yarn nodemanager restart -->
    <property>
        <name>yarn.nodemanager.recovery.enabled</name>
        <value>true</value>
    </property>
    <property>
        <name>yarn.nodemanager.recovery.dir</name>
        <value>/home/hdfs/hadoop/storage/tmp/yarn-nm-recovery</value>
    </property>

    <!-- 配置nodemanager IPC的通信端口 -->
    <property>
        <name>yarn.nodemanager.address</name>
        <value>0.0.0.0:45454</value>
    </property>


    <!--
    用于状态存储的类,默认为org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore,基于Hadoop文件系统的实现。
    还可以为org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore,该类为基于ZooKeeper的实现
    -->
    <property>
        <name>yarn.resourcemanager.store.class</name>
        <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
    </property>
    <!--
        应用程序尝试的最大次数。
        该参数是对所有ApplicationMasters的全局设置,
        每个ApplicationMaster可以通过API指定自己的最大尝试次数,
        但不可以超过全局设置上限,如果超过了,RM将使用全局设置。
        默认值为2,允许AM至少尝试一次。
    -->
    <property>
        <name>yarn.resourcemanager.am.max-attempts</name>
        <value>3</value>
    </property>


    <!-- 指定zk集群地址 -->
    <property>
        <name>yarn.resourcemanager.zk-address</name>
        <value>cdh50-121:2181,cdh50-122:2181,cdh50-127:2181</value>
    </property>
    <!-- 配置在zookeeper的存储位置,存储在zookeeper中 -->
    <property>
        <name>yarn.resourcemanager.zk-state-store.parent-path</name>
        <value>/data/hadoop/rmstore</value>
    </property>


    <!-- RM1的配置 -->
    <property>
        <name>yarn.resourcemanager.hostname.rm1</name>
        <value>cdh50-121</value>
    </property>
    <property>
        <name>yarn.resourcemanager.scheduler.address.rm1</name>
        <value>cdh50-121:8030</value>
    </property>
    <property>
        <name>yarn.resourcemanager.resource-tracker.address.rm1</name>
        <value>cdh50-121:8031</value>
    </property>
    <property>
        <name>yarn.resourcemanager.address.rm1</name>
        <value>cdh50-121:8032</value>
    </property>
    <property>
        <name>yarn.resourcemanager.admin.address.rm1</name>
        <value>cdh50-121:8033</value>
    </property>
    <property>
        <name>yarn.resourcemanager.webapp.address.rm1</name>
        <value>cdh50-121:8088</value>
    </property>
    <property>
        <name>yarn.resourcemanager.webapp.https.address.rm1</name>
        <value>cdh50-121:8090</value>
    </property>


    <!-- RM2的配置 -->
    <property>
        <name>yarn.resourcemanager.hostname.rm2</name>
        <value>cdh50-122</value>
    </property>
    <property>
        <name>yarn.resourcemanager.scheduler.address.rm2</name>
        <value>cdh50-122:8030</value>
    </property>
    <property>
        <name>yarn.resourcemanager.resource-tracker.address.rm2</name>
        <value>cdh50-122:8031</value>
    </property>
    <property>
        <name>yarn.resourcemanager.address.rm2</name>
        <value>cdh50-122:8032</value>
    </property>
    <property>
        <name>yarn.resourcemanager.admin.address.rm2</name>
        <value>cdh50-122:8033</value>
    </property>
    <property>
        <name>yarn.resourcemanager.webapp.address.rm2</name>
        <value>cdh50-122:8088</value>
    </property>
    <property>
        <name>yarn.resourcemanager.webapp.https.address.rm2</name>
        <value>cdh50-122:8090</value>
    </property>

    <!-- NodeManager上运行的附属服务,需配置成mapreduce_shuffle才可运行MapReduce程序 -->
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
    <!-- yarn log -->
    <property>
        <name>yarn.log-aggregation-enable</name>
        <value>true</value>
    </property>
    <!-- 配置日志删除时间为1天,-1为禁用,单位为秒 -->
    <property>
        <name>yarn.log-aggregation.retain-seconds</name>
        <value>1209600</value>
    </property>
    <property>
        <name>yarn.log-aggregation.retain-check-interval-seconds</name>
        <value>86400</value>
    </property>
    <!-- 当应用程序运行结束后,日志被转移到的HDFS目录 -->
    <property>
        <name>yarn.nodemanager.remote-app-log-dir</name>
        <value>/home/hdfs/logs</value>
    </property>
    <property>
        <name>yarn.nodemanager.remote-app-log-dir-suffix</name>
        <value>logs</value>
    </property>


    <!-- yarn内存分配 -->
    <!-- 配置nodemanager可用的资源内存 -->
    <property>
        <name>yarn.nodemanager.resource.memory-mb</name>
        <value>508723</value>
    </property>
    <property>
        <name>yarn.nodemanager.vmem-pmem-ratio</name>
        <value>2.1</value>
    </property>

    <property>
        <name>yarn.nodemanager.pmem-check-enabled</name>
        <value>false</value>
    </property>
    <property>
        <name>yarn.nodemanager.vmem-check-enabled</name>
        <value>false</value>
    </property>

    <!-- 单节点最大可用核数-->
    <property>
        <description>Number of vcores that can be allocated
            for containers. This is used by the RM scheduler when allocating
            resources for containers. This is not used to limit the number of
            physical cores used by YARN containers.
        </description>
        <name>yarn.nodemanager.resource.cpu-vcores</name>
        <value>70</value>
    </property>

    <property>
        <name>yarn.scheduler.minimum-allocation-mb</name>
        <value>2048</value>
    </property>
    <property>
        <name>yarn.scheduler.maximum-allocation-mb</name>
        <value>40960</value>
    </property>

    <property>
        <name>yarn.scheduler.minimum-allocation-vcores</name>
        <value>1</value>
    </property>

    <!-- 单任务最多cores -->
    <property>
        <name>yarn.scheduler.maximum-allocation-vcores</name>
        <value>10</value>
    </property>

    <property>
        <name>yarn.app.mapreduce.am.resource.mb</name>
        <value>40960</value>
    </property>
    <property>
        <name>yarn.app.mapreduce.am.command-opts</name>
        <value>-Xmx32768m</value>
    </property>


    <!-- 队列设置 -->
    <property>
        <name>yarn.resourcemanager.scheduler.class</name>
        <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
    </property>

    <property>
        <name>yarn.log.server.url</name>
        <value>http://cdh50-121:19888/jobhistory/logs</value>
    </property>
</configuration>
  1. workers
cdh50-121
cdh50-122
cdh50-127

四、分发部署

配置改好后分发到其他机器,例如上面的修改配置操作在cdh50-121上操作,分发配置到cdh50-122,cdh50-127

scp -r ~/hadoop hdfs@cdh50-122:~/
scp -r ~/hadoop hdfs@cdh50-127:~/

五、 启动

  1. 执行 start-dfs.sh,因为已经配置了环境变量,所以可以在任意地方执行,文件在hadoop包里的sbin目录下
  2. 启动时会出现namenode启动失败的情况,由于第一次启动hdfs文件系统没有格式化,需要格式化hdfs,首先在第一台namenode机器上格式化,执行命令 hdfs namenode -format ,然后在第二台机器上格式化,执行命令 hdfs namenode -bootstrapStandby ,如果出现格式化报错的情况,观察日志会发现由于JournalNode文件已存在导致,分别在三台机器删除对应目录,比如说是 rm -rf /home/hdfs/hadoop/journaldata/ns1,取决于配置的目录,然后重新格式化
  3. 观察两个namenode节点中的VERSION文件是否一致,文件路径在hdfs-site.xml配置中dfs.namenode.name.dir的配置下,比如说我在这里配置的是/home/hdfs/hadoop/dfs/name,在该目录下有一个current目录,观察两个namenode节点的current目录下的VERSION中的clusterID是否一致,不一致是有问题,会导致datanode节点启动有问题
  4. 配置改好之后执行停止命令 stop-dfs.sh ,之后重新启动dfs,启动后启动yarn,先执行 start-dfs.sh, 成功后执行 start-yarn.sh
  5. 开启JobHistoryServer进程,执行 mapred --daemon start historyserver

六、观察启动日志并检查相应进程

  1. 三个节点观察日志目录下的所有日志,日志目录 /home/hdfs/hadoop/logs/
  2. 三个节点分别执行jdk的jps命令查看进程,jps,分别有如下进程
    属于hdfs的进程
    JournalNode
    NameNode
    DataNode
    DFSZKFailoverController
    JobHistoryServer
    属于yarn的进程
    NodeManager
    ResourceManager