HDFS API 使用代码整理
Java抽象类org.apache.hadoop.fs.FileSystem定义了hadoop的一个文件系统接口。该类是一个抽象类,通过以下两种静态工厂方法可以获得FileSystem实例:
public static FileSystem.get(Configuration conf) throws IOException
public static FileSystem.get(URI uri, Configuration conf) throws IOException
eg:
String filePath = "hdfs://ip:port/recycle/word.pdf”; //ip为namenode地址,port为hdfs端口号 Configuration conf = new Configuration(); FileSystem fs = FileSystem.get(URI.create(filePath), conf);
String filePath ="hdfs://ip:port/recycle/word.pdf”; //ip为namenode地址,port为hdfs端口号
Configurationconf = new Configuration();
FileSystem fs =FileSystem.get(URI.create(filePath), conf);
接下来的就是对文件系统fs实例的访问及操作
1.新建文件目录:public boolean mkdirs(Path f) throws IOException
eg:
String path = "hdfs://ip:port/”; Configurationconf = new Configuration(); FileSystem fs =FileSystem.get(URI.create(path), conf); boolean flag=fs.mkdirs(new Path (path+”目录名”));
String path = "hdfs://ip:port/”;
Configurationconf = new Configuration();
FileSystem fs =FileSystem.get(URI.create(path), conf);
boolean flag=fs.mkdirs(new Path (path+”目录名”));
2.写数据(上传文件到HDFS):public FSOutputStream create(Path f) throws IOException
eg:
File file = new File("d:\\文件上传\\ss.txt"); String path = "hdfs://ip:port/ss.txt”; InputStream in = newBufferedInputStream(new FileInputStream(file)); //缓存文件 Configuration conf = new Configuration(); FileSystem fs = FileSystem.get(URI.create(path),conf);//URI.create(hdfs_path) OutputStream out = fs.create(new Path(path), new Progressable() { @Override publicvoid progress() { System.out.print("*"); } }); IOUtils.copyBytes(in, out,4096, true);
File file = new File("d:\\文件上传\\ss.txt");
String path = "hdfs://ip:port/ss.txt”;
InputStream in = newBufferedInputStream(new FileInputStream(file)); //缓存文件
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(URI.create(path),conf);//URI.create(hdfs_path)
OutputStream out = fs.create(new Path(path), new Progressable() {
@Override
publicvoid progress() {
System.out.print("*");
}
});
IOUtils.copyBytes(in, out,4096, true);
3.读数据(从HDFS下载文件):
public FSDataInputStreamopen(Path f) throws IOException
eg:
String down_path = “hdfs://ip:port/ss.txt”; Configuration conf = newConfiguration(); URI uri = URI.create(down_path); FileSystem fs = FileSystem.get(uri,conf); FSDataInputStream hdfsInStream =fs.open(new Path(down_path)); 得到HDFS文件流hdfsInStream
String down_path = “hdfs://ip:port/ss.txt”;
Configuration conf = newConfiguration();
URI uri = URI.create(down_path);
FileSystem fs = FileSystem.get(uri,conf);
FSDataInputStream hdfsInStream =fs.open(new Path(down_path));
得到HDFS文件流hdfsInStream
4.检查文件/目录是否存在:public boolean exists(Path f) throws IOException
5.文件删除:public boolean delete(Path f, Boolean recursive)
永久性删除指定的文件或目录,如果f是一个空目录或者文件,那么recursive的值就会被忽略。只有recursive=true时,一个非空目录及其内容才会被删除。
注:对应的shell命令中的rm是临时删除,如果配置了hadoop的回收站Trash,rm删除的文件会到/user/root/.Trash/Current/目录下,在设置的时间内会彻底删除,也可以选择恢复
6.从本地系统拷贝到HDFS文件系:public boolean copyFromLocal(Path src, Path dst) throws IOException
7.文件状态信息查询:
FileStatus类封装了文件系统中文件和目录的元数据,包括文件长度、块大小、备份、修改时间、所有者以及权限信息。
通过"FileStatus.getPath()"可查看指定HDFS中某个目录下所有文件。
eg:
String dst = "hdfs://ip:port/"+"user";//user目录下的所有文件 Configuration conf = new Configuration(); FileSystem fs = FileSystem.get(URI.create(dst), conf); FileStatus fileList[] = null; fileList = fs.listStatus(new Path(dst)); //文件路径 int size = fileList.length;//目录下所有文件数目 for(int i = 0; i < size; i++){ System.out.println(fileList[i].getPath().getName());//文件名 System.out.println(fileList[i].getLen());//文件大小 System.out.println(fileList[i].getModificationTime());//文件内容最后一次修改时间 }
String dst = "hdfs://ip:port/"+"user";//user目录下的所有文件
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(URI.create(dst), conf);
FileStatus fileList[] = null;
fileList = fs.listStatus(new Path(dst)); //文件路径
int size = fileList.length;//目录下所有文件数目
for(int i = 0; i < size; i++){
System.out.println(fileList[i].getPath().getName());//文件名
System.out.println(fileList[i].getLen());//文件大小
System.out.println(fileList[i].getModificationTime());//文件内容最后一次修改时间
}
8.文件追加:public FSDataOutputStream append(Path f) throws IOException
9.文件重命名:public boolean rename(Patharg0, Patharg1) throws IOException
HDFS API中没有移动文件的接口,重名名可以实现这一功能
对应Shell命令中的:mv