用简单例子学习Hadoop,统计字符出现次数。

环境:OS:Centos 6.5 x64 &  Soft:Hadoop 1.2.1

1、创建文件夹,写入字符到文本文件。

[huser@master hadoop-1.2.1]$ mkdir input
[huser@master hadoop-1.2.1]$ echo "hello world" >test1.txt
[huser@master hadoop-1.2.1]$ echo "hello hadoop" >test2.txt

2、写到hdfs中

[huser@master hadoop-1.2.1]$ bin/hadoop fs -put ../input ./in

[huser@master hadoop-1.2.1]$ bin/hadoop fs -ls
Found 1 items
drwxr-xr-x - huser supergroup 0 2014-04-16 19:00 /user/huser/in

[huser@master hadoop-1.2.1]$ bin/hadoop fs -ls ./in/*
-rw-r--r-- 1 huser supergroup 12 2014-04-16 19:00 /user/huser/in/test1.txt
-rw-r--r-- 1 huser supergroup 13 2014-04-16 19:00 /user/huser/in/test2.txt

3、运行自带例子

[huser@master hadoop-1.2.1]$ bin/hadoop jar hadoop-examples-1.2.1.jar wordcount in out
14/04/16 19:02:53 INFO input.FileInputFormat: Total input paths to process : 2
14/04/16 19:02:53 INFO util.NativeCodeLoader: Loaded the native-hadoop library
14/04/16 19:02:53 WARN snappy.LoadSnappy: Snappy native library not loaded
14/04/16 19:02:54 INFO mapred.JobClient: Running job: job_201404161850_0001
14/04/16 19:02:55 INFO mapred.JobClient: map 0% reduce 0%
14/04/16 19:03:10 INFO mapred.JobClient: map 100% reduce 0%
14/04/16 19:03:19 INFO mapred.JobClient: map 100% reduce 33%
14/04/16 19:03:21 INFO mapred.JobClient: map 100% reduce 100%
14/04/16 19:03:23 INFO mapred.JobClient: Job complete: job_201404161850_0001
14/04/16 19:03:23 INFO mapred.JobClient: Counters: 30
14/04/16 19:03:23 INFO mapred.JobClient: Job Counters 
14/04/16 19:03:23 INFO mapred.JobClient: Launched reduce tasks=1
14/04/16 19:03:23 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=19368
14/04/16 19:03:23 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
14/04/16 19:03:23 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
14/04/16 19:03:23 INFO mapred.JobClient: Rack-local map tasks=1
14/04/16 19:03:23 INFO mapred.JobClient: Launched map tasks=2
14/04/16 19:03:23 INFO mapred.JobClient: Data-local map tasks=1
14/04/16 19:03:23 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=11082
14/04/16 19:03:23 INFO mapred.JobClient: File Output Format Counters 
14/04/16 19:03:23 INFO mapred.JobClient: Bytes Written=25
14/04/16 19:03:23 INFO mapred.JobClient: FileSystemCounters
14/04/16 19:03:23 INFO mapred.JobClient: FILE_BYTES_READ=55
14/04/16 19:03:23 INFO mapred.JobClient: HDFS_BYTES_READ=239
14/04/16 19:03:23 INFO mapred.JobClient: FILE_BYTES_WRITTEN=169887
14/04/16 19:03:23 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=25
14/04/16 19:03:23 INFO mapred.JobClient: File Input Format Counters 
14/04/16 19:03:23 INFO mapred.JobClient: Bytes Read=25
14/04/16 19:03:23 INFO mapred.JobClient: Map-Reduce Framework
14/04/16 19:03:23 INFO mapred.JobClient: Map output materialized bytes=61
14/04/16 19:03:23 INFO mapred.JobClient: Map input records=2
14/04/16 19:03:23 INFO mapred.JobClient: Reduce shuffle bytes=61
14/04/16 19:03:23 INFO mapred.JobClient: Spilled Records=8
14/04/16 19:03:23 INFO mapred.JobClient: Map output bytes=41
14/04/16 19:03:23 INFO mapred.JobClient: Total committed heap usage (bytes)=415633408
14/04/16 19:03:23 INFO mapred.JobClient: CPU time spent (ms)=4060
14/04/16 19:03:23 INFO mapred.JobClient: Combine input records=4
14/04/16 19:03:23 INFO mapred.JobClient: SPLIT_RAW_BYTES=214
14/04/16 19:03:23 INFO mapred.JobClient: Reduce input records=4
14/04/16 19:03:23 INFO mapred.JobClient: Reduce input groups=3
14/04/16 19:03:23 INFO mapred.JobClient: Combine output records=4
14/04/16 19:03:23 INFO mapred.JobClient: Physical memory (bytes) snapshot=402755584
14/04/16 19:03:23 INFO mapred.JobClient: Reduce output records=3
14/04/16 19:03:23 INFO mapred.JobClient: Virtual memory (bytes) snapshot=2173550592
14/04/16 19:03:23 INFO mapred.JobClient: Map output records=4

4、查看结果

[huser@master hadoop-1.2.1]$ bin/hadoop fs -ls ./out/*
-rw-r--r-- 1 huser supergroup 0 2014-04-16 19:03 /user/huser/out/_SUCCESS
drwxr-xr-x - huser supergroup 0 2014-04-16 19:02 /user/huser/out/_logs/history
-rw-r--r-- 1 huser supergroup 25 2014-04-16 19:03 /user/huser/out/part-r-00000

[huser@master hadoop-1.2.1]$ bin/hadoop fs -cat ./out/part-r-00000
hadoop 1
hello 2
world 1