本文介绍hadoop 最简单配置,保证能本地测试自己写的程序,更多配置可参照末尾文章链接。
之前在虚拟机中安装过hadoop hive hbase,电脑重做系统了,本次需要重新安装,虚拟机实在不方便,转cygwin吧,这个东东不错,window上可以用linux的很多东西,本地查看项目的log也很方便,比如 tail -fn 20 xxx.log 可以观察log输出。
一、cygwin安装
安装很简单,和普通软件一样,但是1.7版本有个问题,如下:
cygwin 1.7 这个版本修改了文件所有者的属性,到时本地debug报错,不能执行 bash
解决方案:
1、修改/etc/fstab文件,增加 none /cygdrive binary,user,noacl,posix=0 0 0
2、修改/etc/password 增加sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin
3、记得还重新配置了一次免登录,密钥写在administrator/.ssh下面
二、ssh 安装
[b]一定要用管理员运行cygwin[/b]
如果有启动ssh失败的情况
依次执行下面命令
$ mkpasswd -l > /etc/passwd
$ mkgroup -l > /etc/group
$ cygrunsrv -R sshd 删除ssd服务
$ ssh-host-config -y 重新配置
$ cygrunsrv -S sshd 启动服务。
Administrator@backup ~
$ chmod +r /etc/group # 对文件权限
Administrator@backup ~
$ chmod +r /etc/passwd # 对文件权限
Administrator@backup ~
$ chmod +rwx /var/ # 对文件权限
Administrator@backup ~
$ ssh-host-config # 引导SSH服务配置
*** Info: Generating /etc/ssh_host_key
*** Info: Generating /etc/ssh_host_rsa_key
*** Info: Generating /etc/ssh_host_dsa_key
*** Info: Creating default /etc/ssh_config file
*** Info: Creating default /etc/sshd_config file
*** Info: Privilege separation is set to yes by default since OpenSSH 3.3.
*** Info: However, this requires a non-privileged account called 'sshd'.
*** Info: For more info on privilege separation read /usr/share/doc/openssh/READ
ME.privsep.
*** Query: Should privilege separation be used? (yes/no) yes #输入yes
*** Info: Updating /etc/sshd_config file
*** Warning: The following functions require administrator privileges!
*** Query: Do you want to install sshd as a service?
*** Query: (Say "no" if it is already installed as a service) (yes/no) yes #输入yes
*** Info: Note that the CYGWIN variable must contain at least "ntsec"
*** Info: for sshd to be able to change user context without password.
*** Query: Enter the value of CYGWIN for the daemon: [ntsec] ntsec #输入ntsec
*** Info: On Windows Server 2003, Windows Vista, and above, the
*** Info: SYSTEM account cannot setuid to other users -- a capability
*** Info: sshd requires. You need to have or to create a privileged
*** Info: account. This script will help you do so.
*** Info: You appear to be running Windows 2003 Server or later. On 2003 and
*** Info: later systems, it's not possible to use the LocalSystem account
*** Info: for services that can change the user id without an explicit password
*** Info: (such as passwordless logins [e.g. public key authentication] via sshd
).
*** Info: If you want to enable that functionality, it's required to create a ne
w
*** Info: account with special privileges (unless a similar account already exis
ts).
*** Info: This account is then used to run these special servers.
*** Info: Note that creating a new user requires that the current account have
*** Info: Administrator privileges itself.
*** Info: No privileged account could be found.
*** Info: This script plans to use 'cyg_server'.
*** Info: 'cyg_server' will only be used by registered services.
*** Query: Do you want to use a different name? (yes/no) no #输入no,不指定启动用户
*** Query: Create new privileged user account 'cyg_server'? (yes/no) no #输入no,不指定启动用户
*** ERROR: There was a serious problem creating a privileged user.
*** Query: Do you want to proceed anyway? (yes/no) yes #输入yes
*** Warning: Expected privileged user 'cyg_server' does not exist.
*** Warning: Defaulting to 'SYSTEM'
*** Info: The sshd service has been installed under the LocalSystem
*** Info: account (also known as SYSTEM). To start the service now, call
*** Info: `net start sshd' or `cygrunsrv -S sshd'. Otherwise, it
*** Info: will start automatically after the next reboot.
*** Info: Host configuration finished. Have fun!
Administrator@backup ~
$ cygrunsrv.exe -S sshd # 启动 SSH服务
3.配置sshd登陆
在terminal窗口中输入ssh-keygen成功密钥文件。生成id_rsa.pub
cp id_rsa.pub authorized_keys
然后关闭terminal,然后在启动输入ssh localhost
三、hadoop安装(hadoop-0.19.1-dc)
1、下载hadoop压缩包,解压到一个目录
2、配置环境,我只做了最简单的配置,保证hadoop服务能起来就可以了。主要是jobtracker:mapred.job.tracker 和namenode:fs.default.name配置好了就可以
hadoop-env.sh 环境变量
export JAVA_HOME=/cygdrive/D/java_tools/jdk1.6.0_29
hadoop-site.xml
<property>
<name>fs.default.name</name>
<value>hdfs://127.0.0.1:9100</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/songpo/hive/hive-1.1.4/hadoop-0.19</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/home/songpo/hive/hadoop-0.19/hdfs/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/songpo/hive/hadoop-0.19/hdfs/data</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>127.0.0.1:9101</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
这样hadoop 就配置好了。
3、记得应该是格式化一下
hadoop namenode -format
4、启动hadoop服务
bin> sh start-all.sh
5、测试
hadoop fs -ls /
三、eclipse调试参照[url]http://v-lad.org/Tutorials/Hadoop/10%20-%20configure%20hadoop.html[/url]