1、检查hadhoop进程是否启动:/usr/jdk1.7.0_51/bin/jps, masters上看到如下显示:
ubuntu@ubuntu-K50ID:~/hadoop-1.2.1$ /usr/jdk1.7.0_51/bin/jps
25174 SecondaryNameNode
25263 JobTracker
25366 Jps
24911 NameNode
slaves上看到:
ubuntu@ubuntu1-VirtualBox:~/hadoop-1.2.1$ /usr/jdk1.7.0_51/bin/jps
5643 DataNode
5834 TaskTracker
5880 Jps
2、在/home/ubuntu目录下创建文件夹input,在input目录下生成2个测试文件。
mkdir input
echo "hello world" >test1.txt
echo "hello hadoop" >test2.txt
在/home/ubuntu目录下执行命令(可以在其他目录执行,但是参数input目录必须设置正确):
/home/ubuntu/hadoop-1.2.1/bin/hadoop fs -put input ./in
察看in目录下文件情况:
ubuntu@ubuntu-K50ID:~$ /home/ubuntu/hadoop-1.2.1/bin/hadoop fs -ls ./in/
Found 3 items
drwxr-xr-x - ubuntu supergroup 0 2014-03-08 11:46 /user/ubuntu/in/input
-rw-r--r-- 1 ubuntu supergroup 12 2014-02-17 20:56 /user/ubuntu/in/test1.txt
-rw-r--r-- 1 ubuntu supergroup 13 2014-02-17 20:56 /user/ubuntu/in/test2.txt
3、测试统计in目录下的文件中各单词数目,命令如下:
ubuntu@ubuntu-K50ID:~/hadoop-1.2.1$ bin/hadoop jar hadoop-examples-1.2.1.jar wordcount in out
14/03/08 12:01:39 INFO input.FileInputFormat: Total input paths to process : 2
14/03/08 12:01:39 INFO util.NativeCodeLoader: Loaded the native-hadoop library
14/03/08 12:01:39 WARN snappy.LoadSnappy: Snappy native library not loaded
14/03/08 12:01:40 INFO mapred.JobClient: Running job: job_201403081130_0004
14/03/08 12:01:41 INFO mapred.JobClient: map 0% reduce 0%
14/03/08 12:01:56 INFO mapred.JobClient: map 100% reduce 0%
14/03/08 12:02:15 INFO mapred.JobClient: map 100% reduce 100%
14/03/08 12:02:16 INFO mapred.JobClient: Job complete: job_201403081130_0004
14/03/08 12:02:16 INFO mapred.JobClient: Counters: 29
14/03/08 12:02:16 INFO mapred.JobClient: Job Counters
14/03/08 12:02:16 INFO mapred.JobClient: Launched reduce tasks=1
14/03/08 12:02:16 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=23911
14/03/08 12:02:16 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
14/03/08 12:02:16 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
14/03/08 12:02:16 INFO mapred.JobClient: Launched map tasks=2
14/03/08 12:02:16 INFO mapred.JobClient: Data-local map tasks=2
14/03/08 12:02:16 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=17925
14/03/08 12:02:16 INFO mapred.JobClient: File Output Format Counters
14/03/08 12:02:16 INFO mapred.JobClient: Bytes Written=25
14/03/08 12:02:16 INFO mapred.JobClient: FileSystemCounters
14/03/08 12:02:16 INFO mapred.JobClient: FILE_BYTES_READ=55
14/03/08 12:02:16 INFO mapred.JobClient: HDFS_BYTES_READ=253
14/03/08 12:02:16 INFO mapred.JobClient: FILE_BYTES_WRITTEN=171697
14/03/08 12:02:16 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=25
14/03/08 12:02:16 INFO mapred.JobClient: File Input Format Counters
14/03/08 12:02:16 INFO mapred.JobClient: Bytes Read=25
14/03/08 12:02:16 INFO mapred.JobClient: Map-Reduce Framework
14/03/08 12:02:16 INFO mapred.JobClient: Map output materialized bytes=61
14/03/08 12:02:16 INFO mapred.JobClient: Map input records=2
14/03/08 12:02:16 INFO mapred.JobClient: Reduce shuffle bytes=61
14/03/08 12:02:16 INFO mapred.JobClient: Spilled Records=8
14/03/08 12:02:16 INFO mapred.JobClient: Map output bytes=41
14/03/08 12:02:16 INFO mapred.JobClient: Total committed heap usage (bytes)=415674368
14/03/08 12:02:16 INFO mapred.JobClient: CPU time spent (ms)=2410
14/03/08 12:02:16 INFO mapred.JobClient: Combine input records=4
14/03/08 12:02:16 INFO mapred.JobClient: SPLIT_RAW_BYTES=228
14/03/08 12:02:16 INFO mapred.JobClient: Reduce input records=4
14/03/08 12:02:16 INFO mapred.JobClient: Reduce input groups=3
14/03/08 12:02:16 INFO mapred.JobClient: Combine output records=4
14/03/08 12:02:16 INFO mapred.JobClient: Physical memory (bytes) snapshot=319578112
14/03/08 12:02:16 INFO mapred.JobClient: Reduce output records=3
14/03/08 12:02:16 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1042714624
14/03/08 12:02:16 INFO mapred.JobClient: Map output records=4
察看目的文件目录下的执行结果,看到_SUCCESS表示任务执行成功了,命令如下:
ubuntu@ubuntu-K50ID:~/hadoop-1.2.1$ bin/hadoop fs -ls ./out
Found 3 items
-rw-r--r-- 1 ubuntu supergroup 0 2014-03-08 12:02 /user/ubuntu/out/_SUCCESS
drwxr-xr-x - ubuntu supergroup 0 2014-03-08 12:01 /user/ubuntu/out/_logs
-rw-r--r-- 1 ubuntu supergroup 25 2014-03-08 12:02 /user/ubuntu/out/part-r-00000
察看执行结果part-r-00000的内容,命令如下:
ubuntu@ubuntu-K50ID:~/hadoop-1.2.1$ bin/hadoop fs -cat ./out/part-r-00000
hadoop 1
hello 2
world 1
至此,测试统计分析完成,hadoop启动运行正常。
4、页面访问hadoop
通过浏览器访问jobtracker所在节点的50030端口监控jobtracker:http://192.168.1.104:50030/
通过浏览器访问namenode所在节点的50070端口监控集群:http://192.168.1.104:50070/
hadoop的安全机制还不完善,最好运行在内网。