hadoop hdfs网页端文件乱码 hdfs文档

转载

AIGC创想家 2023-12-07 19:48:25

文章标签 hadoop hdfs网页端文件乱码 Hadoop HDFS结构 HDFS命令行操作 API 操作HDFS 文章分类 Hadoop 大数据

一、HDFS概述

1.1 HDFS概念

HDFS ( Hadoop Distributed File System ) ，Hadoop分布式文件系统，通过目录树（/）来定位文件。是根据google发表的论文翻版的。论文为GFS（Google File System）Google 文件系统

1.2 HDFS优缺点

1.2.1 优点

① 高容错性：保存多个副本，提高容错性，副本丢失或宕机自动恢复。默认存3份。
② 简单的一致性模型：一次写入，多次读取，不能修改。
③ 适合大数据的处理：数据规模上，能够处理数据规模达到 GB、TB、甚至PB级别的数据。文件规模上，能够处理百万规模以上的文件数量，数量相当之大。
④运行在廉价的机器上。

1.2.2 缺点

① 不适合低延时数据访问，比如毫秒级的存储数据，是做不到的。
② 无法高效的对大量小文件进行存储。
③ 一个文件只能有一个写，不允许多个线程同时写。也不支持文件修改。

1.3 HDFS架构

hadoop hdfs网页端文件乱码 hdfs文档_API 操作HDFS

① Client：客户端。负责文件切分；与NameNode交互，获取文件的位置信息；与DataNode交互，读取或者写入数据；

② NameNode：管理数据块映射；处理客户端的读写请求；配置副本策略；管理HDFS的名称空间；

③ SecondaryNameNode：有两个作用，一是镜像备份，二是日志与镜像的定期合并。它会保存合并后的命名空间镜像的副本，并在namenode发生故障时启用。

④ DataNode：负责存储client发来的数据块block；执行数据块的读写操作。

1.4 HDFS块大小

块的大小可以通过配置参数(hdfs-site.xml 文件中dfs.blocksize)来规定。
默认大小在hadoop2.x版本中是128M，老版本中是64M。

二、HDFS命令行操作

动作	命令	例子
输出命令参数	hadoop fs -help	hadoop fs -help ls
显示根目录信息	hadoop fs -ls	hadoop fs -ls /
创建目录	hadoop fs -mkdir	hadoop fs -mkdir /foodir
删除文件或文件夹	hadoop fs -rm	hadoop fs -rm -r -f /foodir
拷贝	hadoop fs -cp	hadoop fs -cp /test/1.txt /2.txt
移动	hadoop fs -mv	hadoop fs -mv /2.txt /test/
显示文件内容	hadoop fs -cat	hadoop fs -cat /test/1.txt
本地剪切粘贴到HDFS	hadoop fs -moveFromLocal	hadoop fs -moveFromLocal ./3.txt /test
追加文件	hadoop fs -appendToFile	hadoop fs -appendToFile 2.txt /test /1.txt
本地拷贝到HDFS	hadoop fs -copyFromLocal 或者 hadoop fs -put	hadoop fs -copyFromLocal 4.txt /test
HDFS拷贝到本地	hadoop fs -copyToLocal 或者 hadoop fs -get	hadoop fs -copyToLocal /test/1.txt ./5.txt
合并下载多个文件	hadoop fs -getmerge	hadoop fs -getmerge /test/* ./result.txt
统计可用空间信息	hadoop fs -df	hadoop fs -df -h /
统计文件夹的大小信息	hadoop fs -du	hadoop fs -du -h /
设置HDFS中文件的副本数量	hadoop fs -setrep	hadoop fs -setrep 2 /test/6.txt

三、通过API操作HDFS

3.1 获取文件系统

@Test
public void initHDFS() throws Exception {
    // 1 创建配置信息对象
    Configuration configuration = new Configuration();

    // 2 获取文件系统
    FileSystem fs = FileSystem.get(configuration);

    // 3 打印文件系统
    System.out.println(fs.toString());
}

3.2 HDFS文件上传

参数优先级：① 客户端代码中设置的值 > ② classpath下的用户自定义配置文件 > ③ 服务器的默认配置。

@Test
public void testCopyFromLocalFile() throws IOException, InterruptedException, URISyntaxException {
    // 1 获取文件系统
    Configuration configuration = new Configuration();
   //  configuration.set("dfs.replication", "2");  设置副本数量
    FileSystem fs = FileSystem.get(new URI("hdfs://Master:9000"), configuration, "root");
    // 2 上传文件
    fs.copyFromLocalFile(new Path("e:/hello.txt"), new Path("/hello.txt"));
    // 3 关闭资源
    fs.close();
}

3.3 HDFS文件下载

@Test
public void testCopyToLocalFile() throws IOException, InterruptedException, URISyntaxException{
    // 1 获取文件系统
    Configuration configuration = new Configuration();
    FileSystem fs = FileSystem.get(new URI("hdfs://Master:9000"), configuration, "root");
    // 2 执行下载操作
    // boolean delSrc 指是否将原文件删除
    // Path src 指要下载的文件路径
    // Path dst 指将文件下载到的路径
    // boolean useRawLocalFileSystem 是否开启文件效验
    fs.copyToLocalFile(false, new Path("/hello1.txt"), new Path("e:/hello1.txt"), true);
    // 3 关闭资源
    fs.close();
}

3.4 HDFS文件详情查看

@Test
public void testListFiles() throws IOException, InterruptedException, URISyntaxException {
    // 1获取文件系统
    Configuration configuration = new Configuration();
    FileSystem fs = FileSystem.get(new URI("hdfs://Master:9000"), configuration, "root");

    // 2 获取文件详情
    RemoteIterator<LocatedFileStatus> listFiles = fs.listFiles(new Path("/"), true);

    while (listFiles.hasNext()) {
      LocatedFileStatus status = listFiles.next();

      // 输出详情
      // 文件名称
      System.out.println(status.getPath().getName());
      // 长度
      System.out.println(status.getLen());
      // 权限
      System.out.println(status.getPermission());
      // z组
      System.out.println(status.getGroup());

      // 获取存储的块信息
      BlockLocation[] blockLocations = status.getBlockLocations();

      for (BlockLocation blockLocation : blockLocations) {
        // 获取块存储的主机节点
        String[] hosts = blockLocation.getHosts();
        for (String host : hosts) {
          System.out.println(host);
        }
      }
    }
}

3.5 HDFS文件和文件夹判断

@Test
public void testListStatus() throws IOException, InterruptedException, URISyntaxException {
    // 1获取文件系统
    Configuration configuration = new Configuration();
    FileSystem fs = FileSystem.get(new URI("hdfs://Master:9000"), configuration, "root");

    // 2 判断是否是文件还是文件夹
    FileStatus[] listStatus = fs.listStatus(new Path("/"));

    for (FileStatus fileStatus : listStatus) {

      // 如果是文件
      if (fileStatus.isFile()) {
        System.out.println(fileStatus.getPath().getName() + " is a file!");
      } else {
        System.out.println(fileStatus.getPath().getName() + " is not a file!");
      }
    }
}

四、通过IO流操作HDFS

4.1 HDFS文件上传

@Test
public void putFileToHDFS() throws IOException, InterruptedException, URISyntaxException {

    // 1 获取文件系统
    Configuration configuration = new Configuration();
    FileSystem fs = FileSystem.get(new URI("hdfs://Master:9000"), configuration, "root");

    // 2 创建输入流
    FileInputStream fis = new FileInputStream(new File("e:/hello.txt"));

    // 3 获取输出流
    FSDataOutputStream fos = fs.create(new Path("/hello4.txt"));

    // 4 流对接
    IOUtils.copyBytes(fis, fos, configuration);

    // 5 关闭资源
    IOUtils.closeStream(fis);
    IOUtils.closeStream(fos);
}

4.2 HDFS文件下载

@Test
public void getFileFromHDFS() throws IOException, InterruptedException, URISyntaxException{
    // 1 获取文件系统
    Configuration configuration = new Configuration();
    FileSystem fs = FileSystem.get(new URI("hdfs://Master:9000"), configuration, "root");

    // 2 获取输入流
    FSDataInputStream fis = fs.open(new Path("/hello4.txt"));

    // 3 获取输出流
    //FileInputStream fos = new FileInputStream(new File("e:/hello.txt"));

    // 4 流对接 输出到控制台
    IOUtils.copyBytes(fis, System.out, configuration);

    // 5 关闭资源
    IOUtils.closeStream(fis);
}

4.3 定位读取文件

4.3.1 下载第一块

@Test
public void readFileSeek1() throws IOException, InterruptedException, URISyntaxException{
    // 1 获取文件系统
    Configuration configuration = new Configuration();
    FileSystem fs = FileSystem.get(new URI("hdfs://Master:9000"), configuration, "root");

    // 2 获取输入流
    FSDataInputStream fis = fs.open(new Path("/hadoop-2.7.2.tar.gz"));

    // 3 创建输出流
    FileOutputStream fos = new FileOutputStream(new File("e:/hadoop-2.7.2.tar.gz.part1"));

    // 4 流的拷贝
    byte[] buf = new byte[1024];

    for(int i =0 ; i < 1024 * 128; i++){
      fis.read(buf);
      fos.write(buf);
    }

    // 5 关闭资源
    IOUtils.closeStream(fis);
    IOUtils.closeStream(fos);
}

4.3.2 下载第二块

@Test
public void readFileSeek2() throws IOException, InterruptedException, URISyntaxException{

    // 1 获取文件系统
    Configuration configuration = new Configuration();
    FileSystem fs = FileSystem.get(new URI("hdfs://Master:9000"), configuration, "root");

    // 2 打开输入流
    FSDataInputStream fis = fs.open(new Path("/hadoop-2.7.2.tar.gz"));

    // 3 定位输入数据位置
    fis.seek(1024*1024*128);

    // 4 创建输出流
    FileOutputStream fos = new FileOutputStream(new File("e:/hadoop-2.7.2.tar.gz.part2"));

    // 5 流的对拷
    IOUtils.copyBytes(fis, fos, configuration);

    // 6 关闭资源
    IOUtils.closeStream(fis);
    IOUtils.closeStream(fos);
}

4.3.3 合并文件

找到文件所在位置,搜索栏搜索cmd
在 window 命令窗口中执行：type hadoop-2.7.2.tar.gz.part2 >> hadoop-2.7.2.tar.gz.part1 重命名文件名:hadoop-2.7.2.tar.gz.part1为hadoop-2.7.2.tar.gz

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。

上一篇：ubuntu 进入java ubuntu 进入tty后返回图形界面

下一篇：超图 JAVA 打开workspace 超图webgis开发教程

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯