HDFS API 使用代码整理


Java抽象类org.apache.hadoop.fs.FileSystem定义了hadoop的一个文件系统接口。该类是一个抽象类,通过以下两种静态工厂方法可以获得FileSystem实例:

public static FileSystem.get(Configuration conf) throws IOException
public static FileSystem.get(URI uri, Configuration conf) throws IOException

eg:

String filePath = "hdfs://ip:port/recycle/word.pdf”; //ip为namenode地址,port为hdfs端口号
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(URI.create(filePath), conf);


String filePath ="hdfs://ip:port/recycle/word.pdf”; //ipnamenode地址,porthdfs端口号

Configurationconf = new Configuration();

FileSystem fs =FileSystem.get(URI.create(filePath), conf);

接下来的就是对文件系统fs实例的访问及操作


1.新建文件目录:public boolean mkdirs(Path f) throws IOException

eg:

String path = "hdfs://ip:port/”;
Configurationconf = new Configuration();
FileSystem fs =FileSystem.get(URI.create(path), conf);
boolean flag=fs.mkdirs(new Path (path+”目录名”));


String path = "hdfs://ip:port/”;

Configurationconf = new Configuration();

FileSystem fs =FileSystem.get(URI.create(path), conf);

boolean flag=fs.mkdirs(new Path (path+”目录名”));


2.写数据(上传文件到HDFS:public FSOutputStream create(Path f) throws IOException

eg:

File file = new File("d:\\文件上传\\ss.txt");
String path = "hdfs://ip:port/ss.txt”;
InputStream in = newBufferedInputStream(new FileInputStream(file)); //缓存文件
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(URI.create(path),conf);//URI.create(hdfs_path)
OutputStream out = fs.create(new Path(path), new Progressable() {
@Override
publicvoid progress() {
                                System.out.print("*");
                    }
});
IOUtils.copyBytes(in, out,4096, true);


File file = new File("d:\\文件上传\\ss.txt");

String path = "hdfs://ip:port/ss.txt”;

InputStream in = newBufferedInputStream(new FileInputStream(file)); //缓存文件

Configuration conf = new Configuration();

FileSystem fs = FileSystem.get(URI.create(path),conf);//URI.create(hdfs_path)


OutputStream out = fs.create(new Path(path), new Progressable() {

@Override

publicvoid progress() {

                               System.out.print("*");

                   }

});

IOUtils.copyBytes(in, out,4096, true);


3.读数据(从HDFS下载文件):

public FSDataInputStreamopen(Path f) throws IOException

eg:

String down_path = “hdfs://ip:port/ss.txt”;
Configuration conf = newConfiguration();
URI uri = URI.create(down_path);
FileSystem fs = FileSystem.get(uri,conf);
FSDataInputStream hdfsInStream =fs.open(new Path(down_path));
得到HDFS文件流hdfsInStream


String down_path = “hdfs://ip:port/ss.txt”;

Configuration conf = newConfiguration();

URI uri = URI.create(down_path);

FileSystem fs = FileSystem.get(uri,conf);

FSDataInputStream hdfsInStream =fs.open(new Path(down_path));

得到HDFS文件流hdfsInStream


4.检查文件/目录是否存在:public boolean exists(Path f) throws IOException


5.文件删除:public boolean delete(Path f, Boolean recursive)

永久性删除指定的文件或目录,如果f是一个空目录或者文件,那么recursive的值就会被忽略。只有recursivetrue时,一个非空目录及其内容才会被删除。

注:对应的shell命令中的rm是临时删除,如果配置了hadoop的回收站Trashrm删除的文件会到/user/root/.Trash/Current/目录下,在设置的时间内会彻底删除,也可以选择恢复


6.从本地系统拷贝到HDFS文件系:public boolean copyFromLocal(Path src, Path dst) throws IOException


7.文件状态信息查询:

FileStatus类封装了文件系统中文件和目录的元数据,包括文件长度、块大小、备份、修改时间、所有者以及权限信息。

通过"FileStatus.getPath()"可查看指定HDFS中某个目录下所有文件。

eg:

String dst = "hdfs://ip:port/"+"user";//user目录下的所有文件
Configuration conf = new Configuration(); 
FileSystem fs = FileSystem.get(URI.create(dst), conf);
FileStatus fileList[] = null;
fileList = fs.listStatus(new Path(dst)); //文件路径
int size = fileList.length;//目录下所有文件数目
for(int i = 0; i < size; i++){ 
System.out.println(fileList[i].getPath().getName());//文件名
System.out.println(fileList[i].getLen());//文件大小
System.out.println(fileList[i].getModificationTime());//文件内容最后一次修改时间
}


String dst = "hdfs://ip:port/"+"user";//user目录下的所有文件

Configuration conf = new Configuration();  

FileSystem fs = FileSystem.get(URI.create(dst), conf);

FileStatus fileList[] = null;

fileList = fs.listStatus(new Path(dst)); //文件路径

int size = fileList.length;//目录下所有文件数目


for(int i = 0; i < size; i++){  

System.out.println(fileList[i].getPath().getName());//文件名

System.out.println(fileList[i].getLen());//文件大小

System.out.println(fileList[i].getModificationTime());//文件内容最后一次修改时间


}  


8.文件追加:public FSDataOutputStream append(Path f) throws IOException


9.文件重命名:public boolean rename(Patharg0, Patharg1) throws IOException

HDFS  API中没有移动文件的接口,重名名可以实现这一功能

对应Shell命令中的:mv