java 实现hadoop分享 java操作hadoop

转载

mob64ca140ee96c 2023-09-21 19:43:29

文章标签 java 实现hadoop分享 hdfs hadoop 大数据 hive 文章分类 Java 后端开发

Hadoop系列

注：大家觉得博客好的话，别忘了点赞收藏呀，本人每周都会更新关于人工智能和大数据相关的内容，内容多为原创，Python Java Scala SQL 代码，CV NLP 推荐系统等，Spark Flink Kafka Hbase Hive Flume等等~写的都是纯干货，各种顶会的论文解读，一起进步。
今天继续和大家分享一下HDFS基础入门2之JavaAPI操作
#博学谷IT学习技术支持

文章目录

Hadoop系列
前言
一、使用步骤

1.引入POM文件
2.Java代码

二、Arichive机制
总结

前言

继续上次的HDFS基础入门，在这里就是通过Java语言对HDFS上的文件进行增删改查。

java 实现hadoop分享 java操作hadoop_hdfs

一、使用步骤

1.引入POM文件

<dependencies>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>3.3.0</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>3.3.0</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-hdfs</artifactId>
            <version>3.3.0</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-mapreduce-client-core</artifactId>
            <version>3.3.0</version>
        </dependency>
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>4.13</version>
        </dependency>

        <!-- Google Options -->
        <dependency>
            <groupId>com.github.pcj</groupId>
            <artifactId>google-options</artifactId>
            <version>1.0.0</version>
        </dependency>
        <dependency>
            <groupId>commons-io</groupId>
            <artifactId>commons-io</artifactId>
            <version>2.6</version>
        </dependency>
    </dependencies>

2.Java代码

import org.apache.commons.io.IOUtils;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import org.junit.After;
import org.junit.Before;
import org.junit.Test;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.net.URI;
import java.util.Arrays;

public class Demo01 {

    FileSystem fileSystem;

    @Before
    public void init() throws Exception {
        fileSystem = FileSystem.get(new URI("hdfs://node1:8020"), new Configuration(),"root");
    }

    @Test
    public void listFiles() throws Exception {
        RemoteIterator<LocatedFileStatus> iterator = fileSystem.listFiles(new Path("/"), true);
        while (iterator.hasNext()){

            // 获取每一个文件的元信息
            LocatedFileStatus fileStatus = iterator.next();

            //获取文件绝对路径
            String string = fileStatus.getPath().toString();
            System.out.println(string);

            // 获取BLock大小
            long blockSize = fileStatus.getBlockSize();
            System.out.println(blockSize/1024/1024 + "M");

            //获取文件的副本数
            short replication = fileStatus.getReplication();
            System.out.println(replication);

            //获取每一个文件的Block个数
            BlockLocation[] blockLocations = fileStatus.getBlockLocations();
            System.out.println(blockLocations.length);

            //获取每一个BLock的副本所在主机的主机名
            for (BlockLocation blockLocation : blockLocations) {
                String[] hosts = blockLocation.getHosts();
                System.out.println(Arrays.toString(hosts));

            }
        }
    }

    @Test
    public void mkdir() throws Exception{
        //创建文件夹
        fileSystem.mkdirs(new Path("/xxx2/yyy2/zzz2"));
    }

    @Test
    public void download_method1() throws Exception{
        //文件的下载方式1
        FSDataInputStream fileInputStream = fileSystem.open(new Path("/dir/123.txt"));
        File file = new File("E:\\111.txt");
        FileOutputStream fileOutputStream = new FileOutputStream(file);

        IOUtils.copy(fileInputStream, fileOutputStream);
        fileOutputStream.close();
        fileInputStream.close();

    }

    @Test
    public void download_method2() throws Exception{
        //文件的下载方式2
        fileSystem.copyToLocalFile(new Path("/dir/123.txt"),new Path("E:\\222.txt"));

    }

    @Test
    public void upload() throws Exception {
        //文件的上传
        fileSystem.copyFromLocalFile(new Path("E:\\222.txt"),new Path("/dir/222.txt"));
    }

    @Test
    public void delete() throws Exception{
        //文件和文件夹删除
        fileSystem.delete(new Path("/xxx/"),true);
    }

    @Test
    public void appendToFile() throws Exception{
        //将小文件进行合并，然后上传到HDFS
        FSDataOutputStream outputStream = fileSystem.create(new Path("/dir/big.txt"));
        File file = new File("E:\\file");
        File[] files = file.listFiles();
        for (File file1 : files) {
            FileInputStream fileInputStream = new FileInputStream(file1);
            IOUtils.copy(fileInputStream,outputStream);
            fileInputStream.close();
        }
        outputStream.close();
    }

    @After
    public void close() throws Exception{
        fileSystem.close();
    }
}

二、Arichive机制

1、Arichive文件是一个打包文件，但是不会对文件进行压缩
2、Arichive文件归档之后，我们还可以透明的访问其中的每一个小文件
3、Archive主要解决HDFS不擅长存储小文件问题
4、Archive过程是一个MapReduce任务
5、Archive之后，原来的文件依然保留

0、数据准备 
hadoop fs -mkdir /config
cd /export/server/hadoop-3.3.0/etc/hadoop
hadoop fs -put *.xml /config

1、创建一个归档文件
#将/config目录的所有文件进行归档（打包），打包后的文件命名为test.har,并且把打包后的test.har存放在/outputdir目录
hadoop archive -archiveName test.har -p /config  /outputdir

2、查看打包后的归档文件
hadoop fs -cat /outputdir/test.har/part-0

3、查看规定文件中所有小文件的名字
hadoop fs -ls har://hdfs-node1:8020/outputdir/test.har
hadoop fs -ls har:///outputdir/test.har     #如果客户端也是集群的某台主机，可以使用该简写方案

4、查看归档文件中某个小文件的内容
hadoop fs -cat har:///outputdir/test.har/core-site.xml