设置单节点群集(基于原版hadoop2.6.5)

目的

本文档介绍如何设置和配置单节点Hadoop安装,以便您可以使用Hadoop MapReduce和Hadoop分布式文件系统(HDFS)快速执行简单操作。

先决条件

  • 支持平台

支持GNU / Linux作为开发和生产平台。 已经在具有2000个节点的GNU / Linux集群上演示了Hadoop。
Windows也是受支持的平台,但以下步骤仅适用于Linux。要在Windows上设置Hadoop,请参阅Wiki页面。https://wiki.apache.org/hadoop/Hadoop2OnWindows

  • 必备软件

1,Java™ must be installed. Recommended Java versions are described at HadoopJavaVersions.
2,ssh must be installed and sshd must be running to use the Hadoop scripts that manage remote Hadoop daemons.

下载

http://www.apache.org/dyn/closer.cgi/hadoop/common/

准备启动Hadoop集群

解压缩下载的Hadoop发行版。 在这个发行版本中,编辑文件etc/hadoop/hadoop-env.sh以定义一些参数,如下所示:

# set to the root of your Java installation
export JAVA_HOME=/usr/java/latest

# Assuming your installation directory is /usr/local/hadoop
export HADOOP_PREFIX=/usr/local/hadoop

请尝试以下命令:

$ bin / hadoop

这将显示hadoop脚本的使用文档。
现在,您已准备好以三种支持模式之一启动Hadoop集群:
* Local (Standalone) Mode
* Pseudo-Distributed Mode
* Fully-Distributed Mode

单机操作(本地模式)

默认情况下,Hadoop配置为以单机模式(non-distributed mode)运行,作为单个Java进程。 这对调试很有用。

以下示例复制解压缩的conf目录以用作输入,然后查找并显示给定正则表达式的每个匹配项。 输出将写入给定的输出目录。

$ mkdir input
$ cp etc/hadoop/*.xml input
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.5.jar grep input output 'dfs[a-z.]+'
$ cat output/*

伪分布式操作(伪分布式模式)

Hadoop还可以在伪分布式模式下在单节点上运行,其中每个Hadoop守护程序在单独的Java进程中运行。

Configuration

Use the following:

  • etc/hadoop/core-site.xml:


    fs.defaultFS
    hdfs://localhost:9000
  • etc/hadoop/hdfs-site.xml:


    dfs.replication
    1
设置passphraseless ssh

配置一个免密码通行证

Now check that you can ssh to the localhost without a passphrase:

$ ssh localhost

If you cannot ssh to localhost without a passphrase, execute the following commands:

$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
Execution(执行脚本)

以下说明是在本地运行MapReduce作业。 如果要在YARN上执行作业,请参阅单节点上的YARN。

Format the filesystem:

$ bin/hdfs namenode -format

Start NameNode daemon and DataNode daemon:

$ sbin/start-dfs.sh

The hadoop daemon log output is written to the $HADOOP_LOG_DIR directory (defaults to $HADOOP_HOME/logs).

Browse the web interface for the NameNode; by default it is available at:
NameNode - http://localhost:50070/

Make the HDFS directories required to execute MapReduce jobs:

$ bin/hdfs dfs -mkdir /user
$ bin/hdfs dfs -mkdir /user/<username>

Copy the input files into the distributed filesystem:

$ bin/hdfs dfs -put etc/hadoop input

Run some of the examples provided:

$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.5.jar grep input output 'dfs[a-z.]+'

Examine the output files:
Copy the output files from the distributed filesystem to the local filesystem and examine them:

$ bin/hdfs dfs -get output output
$ cat output/*

or

View the output files on the distributed filesystem:

$ bin/hdfs dfs -cat output/*

When you’re done, stop the daemons with:

$ sbin/stop-dfs.sh
YARN on Single Node(在单节点启动yarn)

您可以通过设置一些参数并运行ResourceManager守护程序和NodeManager守护程序,以伪分布式模式在YARN上运行MapReduce作业。

The following instructions assume that 1. ~ 4. steps of the above instructions are already executed.

1 Configure parameters as follows:
  • etc/hadoop/mapred-site.xml:


    mapreduce.framework.name
    yarn
  • etc/hadoop/yarn-site.xml:


    yarn.nodemanager.aux-services
    mapreduce_shuffle
2 Start ResourceManager daemon and NodeManager daemon:
$ sbin/start-yarn.sh
3 Browse in the web

Browse the web interface for the ResourceManager; by default it is available at:
ResourceManager - http://localhost:8088/

4 Run a MapReduce job.

When you’re done, stop the daemons with:

$ sbin/stop-yarn.sh