
推的命令:             把111上的user文件 推到 112的家目录下 命令

                               目录必须加 -r, 是文件则不用加

[root@bigdata111 ~]# scp -r user root@bigdata112:/root/
itstar                                        100%  121     0.1KB/s   00:00    
aa                                            100%    0     0.0KB/s   00:00

拉的命令:   把111上的/plus目录拉到 本地112 上 的家目录

                     目录必须加 -r, 是文件则不用加

[root@bigdata112 ~]# scp -r root@bigdata111:/root/plus1 /root/
test                                          100%    0     0.0KB/s   00:00    
123                                           100%    0     0.0KB/s   00:00    
456                                           100% 5395     5.3KB/s   00:00

 B:  归档操作:










用法:hadoop archive -archiveName  归档名称 -p 父目录 [-r <复制因子>]  原路径(可以多个)  目的路径

bin/  foo.har -p /plus -r 3 a b c /


[root@bigdata111 ~]# ll
总用量 527772
-rwx-wx---. 2 liqing root        5500 9月  28 21:52 123
-rwx-wx---. 2 liqing root        5500 9月  28 21:52 aa
drwxrwxrwx. 3 root   root          15 9月  17 12:30 aaaaa
drwxr-xr-x. 2 root   root           6 9月  19 22:07 aaaaaaaa
-rw-r--r--. 1 root   root         194 9月  17 19:42 aa.zip
-rw-------. 1 root   root        1536 7月  28 19:07 anaconda-ks.cfg
-rwxrwxrwx. 1 liqing liqing        27 9月  28 21:52 bb
lrwxrwxrwx. 1 root   root           2 9月  18 18:55 blianjie -> bb
-rw-r--r--. 1 root   root          28 10月  4 21:14 cc
-rw-r--r--. 1 root   root         189 9月  17 14:40 dd1.gz
-rw-r--r--. 1 root   root        1583 9月  18 16:35 ddddddd
-rw-r--r--. 1 root   root         189 9月  17 14:36 dd.gz
-rw-r--r--. 1 root   root         564 9月  17 14:37 ff.gz
-rw-r--r--. 1 root   root           4 9月  17 23:19 gg
-rwxrwxrwx. 1 root   root          16 9月  28 23:32 hh
drwxr-xr-x. 4 root   root          28 9月  18 14:45 itstar
drwxr-xr-x. 3 root   root          46 8月   2 19:16 liqing
lrwxrwxrwx. 1 root   root           2 9月  17 12:32 mm -> aa
drwxr-xr-x. 2 root   root          29 9月  18 14:17 mod222
-rw-r--r--. 1 root   root         108 9月  17 14:33 mod.gz
-rw-r--r--. 1 root   root          10 9月  28 23:40 ooo
drwxr-xr-x. 2 root   root          30 10月  8 18:02 plus
drwxr-xr-x. 4 root   root          29 9月  18 14:35 plus1
-rw-r--r--. 1 root   root   540330028 8月  23 20:31 Python素材.rar
-rw-r--r--. 1 root   root       15650 10月  8 22:59 ss
-rwxrwxrwx. 1 root   root           0 8月   7 11:15 test1.java
drwxr-xr-x. 2   1001   1001        43 10月  8 18:29 test2.java


[root@bigdata111 ~]# hdfs dfs -put test2.java /


[root@bigdata111 ~]#  hadoop archive -archiveName foo1.har -p /test2.java /



19/10/08 22:57:15 INFO client.RMProxy: Connecting to ResourceManager at bigdata112/
19/10/08 22:57:17 INFO client.RMProxy: Connecting to ResourceManager at bigdata112/
19/10/08 22:57:17 INFO client.RMProxy: Connecting to ResourceManager at bigdata112/
19/10/08 22:57:18 INFO mapreduce.JobSubmitter: number of splits:1
19/10/08 22:57:18 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1570522163334_0004
19/10/08 22:57:19 INFO impl.YarnClientImpl: Submitted application application_1570522163334_0004
19/10/08 22:57:19 INFO mapreduce.Job: The url to track the job: http://bigdata112:8088/proxy/application_1570522163334_0004/
19/10/08 22:57:19 INFO mapreduce.Job: Running job: job_1570522163334_0004
19/10/08 22:57:36 INFO mapreduce.Job: Job job_1570522163334_0004 running in uber mode : false
19/10/08 22:57:36 INFO mapreduce.Job:  map 0% reduce 0%
19/10/08 22:57:51 INFO mapreduce.Job:  map 100% reduce 0%
19/10/08 22:58:05 INFO mapreduce.Job:  map 100% reduce 100%
19/10/08 22:58:06 INFO mapreduce.Job: Job job_1570522163334_0004 completed successfully
19/10/08 22:58:06 INFO mapreduce.Job: Counters: 49
	File System Counters
		FILE: Number of bytes read=292
		FILE: Number of bytes written=319701
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=14643
		HDFS: Number of bytes written=14489
		HDFS: Number of read operations=19
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=8
	Job Counters 
		Launched map tasks=1
		Launched reduce tasks=1
		Other local map tasks=1
		Total time spent by all maps in occupied slots (ms)=13339
		Total time spent by all reduces in occupied slots (ms)=10992
		Total time spent by all map tasks (ms)=13339
		Total time spent by all reduce tasks (ms)=10992
		Total vcore-milliseconds taken by all map tasks=13339
		Total vcore-milliseconds taken by all reduce tasks=10992
		Total megabyte-milliseconds taken by all map tasks=13659136
		Total megabyte-milliseconds taken by all reduce tasks=11255808
	Map-Reduce Framework
		Map input records=4
		Map output records=4
		Map output bytes=278
		Map output materialized bytes=292
		Input split bytes=116
		Combine input records=0
		Combine output records=0
		Reduce input groups=4
		Reduce shuffle bytes=292
		Reduce input records=4
		Reduce output records=0
		Spilled Records=8
		Shuffled Maps =1
		Failed Shuffles=0
		Merged Map outputs=1
		GC time elapsed (ms)=405
		CPU time spent (ms)=4800
		Physical memory (bytes) snapshot=318402560
		Virtual memory (bytes) snapshot=4166209536
		Total committed heap usage (bytes)=182063104
	Shuffle Errors
	File Input Format Counters 
		Bytes Read=323
	File Output Format Counters 
		Bytes Written=0

5.在Web的界面会出现, 自己设置生成的文件 foo1.har

drwxr-xr-x	root	supergroup	0 B	Oct 08 22:58	0	0 B	foo1.har

6.点进去 foo1.har目录 会有生成的文件:

Hadoop存档是特殊格式的存档。Hadoop存档映射到文件系统目录。Hadoop归档文件总是带有* .har扩展名


数据部分data(part- *)文件。


-rw-r--r--	root	supergroup	0 B	Oct 08 22:58	3	128 MB	_SUCCESS	
-rw-r--r--	root	supergroup	262 B	Oct 08 22:58	3	128 MB	_index	
-rw-r--r--	root	supergroup	23 B	Oct 08 22:58	3	128 MB	_masterindex	
-rw-r--r--	root	supergroup	13.87 KB	Oct 08 22:57	3	512 MB	part-0

hadoop 归档 hdfs归档文件_HDFS




把foo1.har文件解压到 HDFS的/itstar目录下(相当于拷贝进去)

查看归档:      hadoop fs -lsr har:///foo1.har

[root@bigdata111 ~]# hadoop fs -lsr har:///foo1.har
lsr: DEPRECATED: Please use 'ls -R' instead.
-rw-r--r--   3 root supergroup      12288 2019-10-08 22:56 har:///foo1.har/.swp
-rw-r--r--   3 root supergroup          9 2019-10-08 22:56 har:///foo1.har/aa
-rw-r--r--   3 root supergroup       1907 2019-10-08 22:56 har:///foo1.har/test1.java
[root@bigdata111 ~]# hadoop fs -cp har:///foo1.har/* /itstar



[root@bigdata111 ~]# hadoop distcp har:/foo1.har /123


19/10/08 23:26:54 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, overwrite=false, skipCRC=false, blocking=true, numListstatusThreads=0, maxMaps=20, mapBandwidth=100, sslConfigurationFile='null', copyStrategy='uniformsize', preserveStatus=[], preserveRawXattrs=false, atomicWorkPath=null, logPath=null, sourceFileListing=null, sourcePaths=[har:/foo1.har], targetPath=/123, targetPathExists=false, filtersFile='null'}
19/10/08 23:26:54 INFO client.RMProxy: Connecting to ResourceManager at bigdata112/
19/10/08 23:26:55 INFO tools.SimpleCopyListing: Paths (files+dirs) cnt = 4; dirCnt = 1
19/10/08 23:26:55 INFO tools.SimpleCopyListing: Build file listing completed.
19/10/08 23:26:55 INFO Configuration.deprecation: io.sort.mb is deprecated. Instead, use mapreduce.task.io.sort.mb
19/10/08 23:26:55 INFO Configuration.deprecation: io.sort.factor is deprecated. Instead, use mapreduce.task.io.sort.factor
19/10/08 23:26:55 INFO tools.DistCp: Number of paths in the copy list: 4
19/10/08 23:26:55 INFO tools.DistCp: Number of paths in the copy list: 4
19/10/08 23:26:55 INFO client.RMProxy: Connecting to ResourceManager at bigdata112/
19/10/08 23:26:57 INFO mapreduce.JobSubmitter: number of splits:4
19/10/08 23:26:57 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1570522163334_0005
19/10/08 23:26:58 INFO impl.YarnClientImpl: Submitted application application_1570522163334_0005
19/10/08 23:26:58 INFO mapreduce.Job: The url to track the job: http://bigdata112:8088/proxy/application_1570522163334_0005/
19/10/08 23:26:58 INFO tools.DistCp: DistCp job-id: job_1570522163334_0005
19/10/08 23:26:58 INFO mapreduce.Job: Running job: job_1570522163334_0005
19/10/08 23:27:12 INFO mapreduce.Job: Job job_1570522163334_0005 running in uber mode : false
19/10/08 23:27:12 INFO mapreduce.Job:  map 0% reduce 0%
19/10/08 23:27:24 INFO mapreduce.Job:  map 25% reduce 0%
19/10/08 23:27:25 INFO mapreduce.Job:  map 50% reduce 0%
19/10/08 23:27:31 INFO mapreduce.Job:  map 100% reduce 0%
19/10/08 23:27:32 INFO mapreduce.Job: Job job_1570522163334_0005 completed successfully
19/10/08 23:27:32 INFO mapreduce.Job: Counters: 33
	File System Counters
		FILE: Number of bytes read=0
		FILE: Number of bytes written=643036
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=17110
		HDFS: Number of bytes written=14204
		HDFS: Number of read operations=117
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=15
	Job Counters 
		Launched map tasks=4
		Other local map tasks=4
		Total time spent by all maps in occupied slots (ms)=51372
		Total time spent by all reduces in occupied slots (ms)=0
		Total time spent by all map tasks (ms)=51372
		Total vcore-milliseconds taken by all map tasks=51372
		Total megabyte-milliseconds taken by all map tasks=52604928
	Map-Reduce Framework
		Map input records=4
		Map output records=0
		Input split bytes=536
		Spilled Records=0
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=499
		CPU time spent (ms)=3530
		Physical memory (bytes) snapshot=418123776
		Virtual memory (bytes) snapshot=8321691648
		Total committed heap usage (bytes)=138149888
	File Input Format Counters 
		Bytes Read=1230
	File Output Format Counters 
		Bytes Written=0
	DistCp Counters
		Bytes Copied=14204
		Bytes Expected=14204
		Files Copied=4


drwxr-xr-x	root	supergroup	0 B	Oct 08 23:27	0	0 B	123

4.数据大小没有改变,只是归档后减少namenode元数据占用 的内存:

-rw-r--r--	root	supergroup	12 KB	Oct 08 23:27	3	128 MB	.swp	
-rw-r--r--	root	supergroup	9 B	Oct 08 23:27	3	128 MB	aa	
-rw-r--r--	root	supergroup	1.86 KB	Oct 08 23:27	3	128 MB	test1.java