HDFS Java API编程实践 hdfs教程

转载

mob64ca13fdd43c 2023-10-18 19:29:59

文章标签 HDFS Java API编程实践 hadoop hdfs api 文件系统 文章分类 Java 后端开发

一、HDFS命令行操作

1、基本语法

[root@hadoop102 hadoop-2.7.2]# bin/hadoop fs 具体命令

2、常用命令实操
（1）-help：输出这个命令参数

[root@hadoop102 hadoop-2.7.2]# bin/hdfs dfs -help rm

（2）-ls: 显示目录信息

[root@hadoop102 hadoop-2.7.2]# hadoop fs -ls /

（3）-mkdir：在hdfs上创建目录

[root@hadoop102 hadoop-2.7.2]# hadoop fs  -mkdir  -p  /aaa/bbb/cc/dd

（4）-moveFromLocal从本地剪切粘贴到hdfs

[root@hadoop102 hadoop-2.7.2]# hadoop  fs  - moveFromLocal  /home/hadoop/a.txt  /aaa/bbb/cc/dd

（5）-moveToLocal：从hdfs剪切粘贴到本地

[root@hadoop102 hadoop-2.7.2]# hadoop  fs  - moveToLocal   /aaa/bbb/cc/dd  /home/hadoop/a.txt

（6）–appendToFile ：追加一个文件到已经存在的文件末尾

[root@hadoop102 hadoop-2.7.2]# hadoop  fs  -appendToFile  ./hello.txt  /hello.txt

（7）-cat ：显示文件内容

[root@hadoop102 hadoop-2.7.2]# hadoop  fs  -cat ./hello.txt

（8）-tail：显示一个文件的末尾

[root@hadoop102 hadoop-2.7.2]# hadoop  fs  -tail  /weblog/access_log.1

（9）-text：以字符形式打印一个文件的内容

[root@hadoop102 hadoop-2.7.2]# hadoop  fs  -text  /weblog/access_log.1

（10）-chgrp 、-chmod、-chown：linux文件系统中的用法一样，修改文件所属权限

[root@hadoop102 hadoop-2.7.2]# hadoop  fs  -chmod  666  /hello.txt
[root@hadoop102 hadoop-2.7.2]# hadoop  fs  -chown  someuser:somegrp   /hello.txt

（11）-copyFromLocal：从本地文件系统中拷贝文件到hdfs路径去

[root@hadoop102 hadoop-2.7.2]# hadoop  fs  -copyFromLocal  ./jdk.tar.gz  /aaa/

（12）-copyToLocal：从hdfs拷贝到本地

[root@hadoop102 hadoop-2.7.2]# hadoop fs -copyToLocal /aaa/jdk.tar.gz

（13）-cp ：从hdfs的一个路径拷贝到hdfs的另一个路径

[root@hadoop102 hadoop-2.7.2]# hadoop  fs  -cp  /aaa/jdk.tar.gz  /bbb/jdk.tar.gz.2

（14）-mv：在hdfs目录中移动文件

[root@hadoop102 hadoop-2.7.2]# hadoop  fs  -mv  /aaa/jdk.tar.gz  /

（15）-get：等同于copyToLocal，就是从hdfs下载文件到本地

[root@hadoop102 hadoop-2.7.2]# hadoop fs -get  /aaa/jdk.tar.gz

（16）-getmerge ：合并下载多个文件，比如hdfs的目录 /aaa/下有多个文件:log.1, log.2,log.3,…

[root@hadoop102 hadoop-2.7.2]# hadoop fs -getmerge /aaa/log.* ./log.sum

（17）-put：等同于copyFromLocal

[root@hadoop102 hadoop-2.7.2]# hadoop  fs  -put  /aaa/jdk.tar.gz  /bbb/jdk.tar.gz.2

（18）-rm：删除文件或文件夹

[root@hadoop102 hadoop-2.7.2]# hadoop fs -rm -r /aaa/bbb/

（19）-rmdir：删除空目录

[root@hadoop102 hadoop-2.7.2]# hadoop  fs  -rmdir   /aaa/bbb/ccc

（20）-df ：统计文件系统的可用空间信息

[root@hadoop102 hadoop-2.7.2]# hadoop  fs  -df  -h  /

（21）-du统计文件夹的大小信息

[root@hadoop102 hadoop-2.7.2]# hadoop  fs  -du  -s  -h /aaa/*

（22）-count：统计一个指定目录下的文件节点数量

[root@hadoop102 hadoop-2.7.2]# hadoop fs -count /aaa/

（23）-setrep：设置hdfs中文件的副本数量

[root@hadoop102 hadoop-2.7.2]# hadoop fs -setrep 3 /aaa/jdk.tar.gz

这里设置的副本数只是记录在namenode的元数据中，是否真的会有这么多副本，还得看datanode的数量。因为目前只有3台设备最多也就3个副本，只有节点数的增加到10台时，副本数才能达到10。

二、HDFS客户端操作

这里，使用eclipse，总体步骤分为以下步：

Step1：new一个配置文件，配置在集群上运行
这里有2种方式，方式1：

/*假设集群地址为hdfs://hadoop102:8020*/
Configuration configuration = new Configuration();
configuration.set("fs.defaultFS", "hdfs://hadoop102:8020");

当然，也可以采用配置文件的方式，方式2：
前置在工程的src目录下新建一个core-site.xml文件，内容如下：

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
<!-- 指定HDFS中NameNode的地址 -->
	<property>
		<name>fs.defaultFS</name>
        <value>hdfs://hadoop102:8020</value>
	</property>

	<!-- 指定hadoop运行时产生文件的存储目录 -->
	<property>
		<name>hadoop.tmp.dir</name>
		<value>/opt/module/hadoop-2.7.2/data/tmp</value>
	</property>
</configuration>

配置完成后，直接new一个Configuration()即可，如下：

Configuration configuration = new Configuration();

★ 注意参数优先级：客户端代码中设置的值（方式1） > classpath下的用户自定义配置文件（方式2） > 服务器的默认配置

Step2：获取文件系统FileSystem
这里有2种方式，方式1：

FileSystem fs = FileSystem.get(configuration);

方式2：

FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102:8020"),configuration, "root");

★ 这两种方式的区别在于，如果用方式1，在运行的时候，需要选择“Run Configurations”，在“Arguments” > “VM arguments”中填入

-DHADOOP_USER_NAME=root

也就是说要选择root用户名（或者服务器的其他用户）去操作HDFS，如果这里不填，默认用windows的用户名（比如：Administrator）去操作，这样就会报错。而选择方式2直接执行就可以（因为方法中已经把用户名作为参数传入了）。

Step3：执行文件系统的方法

fs.FileSystem中的方法;

Step4：关闭资源

fs.close();

三、常用的客户端操作

1、API操作

package hdfs;

import java.io.IOException;
import java.net.URI;
import java.net.URISyntaxException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.BlockLocation;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.LocatedFileStatus;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.RemoteIterator;
import org.junit.Test;

public class HDFSClient {
	
	public static void main(String[] args) throws Exception {
		//0 获取配置信息
		Configuration configuration = new Configuration();
		configuration.set("fs.defaultFS", "hdfs://hadoop102:8020");
		
		//1 获取文件系统
		//FileSystem fs = FileSystem.get(configuration);
		FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102:8020"), configuration, "root");
		
		//2 拷贝本地数据到集群
		fs.copyFromLocalFile(new Path("D:/buffer.txt"), new Path("/user/root/buffer.txt"));
		
		//3 关闭fs
		fs.close();
	}
	
	//获取文件系统
	@Test
	public void getFileSystem() throws Exception {
		
		//0 创建配置信息对象
		Configuration configuration = new Configuration();
		
		//1 获取文件系统
		//FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102:8020"), configuration, "root");
		FileSystem fs = FileSystem.get(configuration);
		
		//2 打印文件系统
		System.out.println(fs.toString());
		
		//关闭fs
		fs.close();
		
	}
	
	//上传文件
	@Test
	public void putFileToHDFS() throws Exception {
		
		//0 创建配置信息对象
		Configuration configuration = new Configuration();
				
		//1 获取文件系统
		FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102:8020"), configuration, "root");
		
		//2 执行上传文件的命令
		fs.copyFromLocalFile(true, new Path("D:/buffer1.txt"), new Path("/user/root/buffer1.txt"));
		
		//3 关闭资源
		fs.close();
	}
	
	//文件下载
	@Test
	public void getFileFromHDFS() throws Exception {
		
		//1 获取文件系统
		FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102:8020"), new Configuration(),"root");
		
		//2 执行下载文件命令
		fs.copyToLocalFile(new Path("/user/root/buffer1.txt"), new Path("D:/buffer1.txt"));
		
		//3 关闭资源
		fs.close();
	}
	
	//在集群上创建目录
	@Test
	public void mkdirAtHDFS() throws Exception {
		
		//1 获取文件系统
		FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102:8020"), new Configuration(),"root");
				
		//2 执行创建文件夹操作
		fs.mkdirs(new Path("/user/root/other"));
				
		//3 关闭资源
		fs.close();		
	}
	
	//删除文件夹
	@Test
	public void deleteAtHDFS() throws Exception {
		
		//1 获取文件系统
		FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102:8020"), new Configuration(),"root");
		
		//2 执行删除操作
		boolean b = fs.delete(new Path("/user/root/buffer1.txt"), true); // 后面的Boolean表示是否递归
		System.out.println(b);
		
		//3 关闭资源
		fs.close();
	}
	
	//更改文件名称
	@Test
	public void renameAtHDFS() throws Exception {
		
		//1 获取文件系统
		FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102:8020"), new Configuration(),"root");
		
		//2 执行更改名称操作
		fs.rename(new Path("/user/root/buffer.txt"), new Path("/user/root/buffer1.txt"));
		
		//3 关闭资源
		fs.close();
	}
	
	//查看文件详情
	@Test
	public void readFileAtHDFS() throws Exception {

		//1 获取文件系统
		FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102:8020"), new Configuration(),"root");
		
		//2 执行查看文件详情操作
		RemoteIterator<LocatedFileStatus> files = fs.listFiles(new Path("/"), true);
		while(files.hasNext()) {
			LocatedFileStatus status = files.next();
			//文件名称
			System.out.println(status.getPath().getName());
			//块的大小
			System.out.println(status.getBlockSize());
			//内容的长度
			System.out.println(status.getLen());
			//文件权限
			System.out.println(status.getPermission());
			
			System.out.println("----------");
			
			//文件块的具体信息
			BlockLocation[] locations = status.getBlockLocations();
			for (BlockLocation block : locations) {
				System.out.println(block.getOffset());//从哪一个块开始存，一个块就是从0开始，如果有2块，第二块从134217729开始存（因为第一块128M=134217728字节已经存满）
				
				String[] hosts = block.getHosts();
				for (String s : hosts) {
					System.out.println(s);
				}
			}	
		}
		
		//3 关闭资源
		fs.close();
	}
	
	//获取文件夹和文件信息
	@Test
	public void readFolderAtHDFS() throws Exception {

		//1 获取文件系统
		FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102:8020"), new Configuration(),"root");
		
		//2 判断是文件夹还是文件
		FileStatus[] listStatus = fs.listStatus(new Path("/user/root"));
		for (FileStatus status : listStatus) {
			if(status.isFile()) {
				System.out.println("f----"+status.getPath().getName());
			}else {
				System.out.println("d----"+status.getPath().getName());
			}
		}
		
		//3 关闭资源
		fs.close();
	}
}

2、IO流操作

package hdfs;

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.net.URI;
import java.net.URISyntaxException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
import org.junit.Test;

public class IOToHDFS {
	
	//文件的上传
	@Test
	public void putFileToHDFS() throws Exception {
		
		//1 获取文件系统
		Configuration configuration = new Configuration();
		FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102:8020"), configuration, "root");
		
		//2 获取输出流  输出流属于HDFS
		FSDataOutputStream fos = fs.create(new Path("/user/root/output/buffer.txt"));
		
		//3 获取输入流
		FileInputStream fis = new FileInputStream("D:/buffer.txt");
		
		try {
			//4 流对接
			IOUtils.copyBytes(fis, fos, configuration);
		} catch (Exception e) {
			// TODO: handle exception
		}finally {
			//5 关闭资源
			IOUtils.closeStream(fis);
			IOUtils.closeStream(fos);
			fs.close();
		}	
	}
	
	//文件的下载
	@Test
	public void getFileFromHDFS() throws Exception {
		// 1 获取文件系统
		Configuration configuration = new Configuration();
		FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102:8020"), configuration, "root");

		// 2 获取输入流
		FSDataInputStream fis = fs.open(new Path("/user/root/buffer.txt"));

		// 3 创建输出流
		FileOutputStream fos = new FileOutputStream(new File("D:\\buffer.txt"));

		try {
			// 4 流对接
			IOUtils.copyBytes(fis, fos, configuration);
		} catch (Exception e) {
			e.printStackTrace();
		} finally {
			// 5 关闭资源
			IOUtils.closeStream(fis);
			IOUtils.closeStream(fos);
			fs.close();
		}
	}
	
	//下载大文件的第一块
	@Test
	public void getFileFromHDFSSeek1() throws Exception {
		// 1 获取文件系统
		Configuration configuration = new Configuration();
		FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102:8020"), configuration, "root");

		// 2 获取输入流
		FSDataInputStream fis = fs.open(new Path("/user/root/input/hadoop-2.7.2.tar.gz"));

		// 3 创建输出流
		FileOutputStream fos = new FileOutputStream(new File("D:\\hadoop-2.7.2.tar.gz.part1"));

		try {
			// 4 流对接（只读取128M）
			byte[] bytes = new byte[1024];
			for(int i = 0; i< 1024*128; i++) {
				fis.read(bytes);
				fos.write(bytes);
			}		
		} catch (Exception e) {
			e.printStackTrace();
		} finally {
			// 5 关闭资源
			IOUtils.closeStream(fis);
			IOUtils.closeStream(fos);
			fs.close();
		}
	}
	
	
	//下载大文件的第二块
	@Test
	public void getFileFromHDFSSeek2() throws Exception {
		// 1 获取文件系统
		Configuration configuration = new Configuration();
		FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102:8020"), configuration, "root");

		// 2 获取输入流
		FSDataInputStream fis = fs.open(new Path("/user/root/input/hadoop-2.7.2.tar.gz"));

		// 3 创建输出流
		FileOutputStream fos = new FileOutputStream(new File("D:\\hadoop-2.7.2.tar.gz.part2"));

		try {
			// 4 流对接（指向第二块数据的首地址）
			// 4.1 定位到128M
			fis.seek(1024*1024*128);
			// 4.2 对拷流
			IOUtils.copyBytes(fis, fos, configuration);
		} catch (Exception e) {
			e.printStackTrace();
		} finally {
			// 5 关闭资源
			IOUtils.closeStream(fis);
			IOUtils.closeStream(fos);
			fs.close();
		}
	}
}

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。