mapreduce自定义输出文件名 mapreduce没有输出文件

转载

mob64ca14068b0b 2024-02-29 22:56:23

文章标签 mapreduce自定义输出文件名 mrUnit MapReduce 大数据 Text 文章分类 架构后端开发

一、问题

由于上一个程序，有一些行数并不会得到处理，但是通过观测代码，以及数据的人工处理，并没有发现代码有任何问题，因此希望通过调试的方式去找出原因；但是在Windows下，我们并不能直接的运行我们的程序，因此，我们需要使用apache的mrunit的单元测试工具。

二、加载配置文件

我们可以通过访问网站https://mvnrepository.com/artifact/org.apache.mrunit/mrunit/1.1.0，得到相应的配置，但是我们需要添加一个我们自己版本的<classifier>hadoop2</classifier>

添加mrunit配置，向pom.xml中添加如下信息，并点击Import Changes（在Even Log中）

<dependency>
            <groupId>org.apache.mrunit</groupId>
            <artifactId>mrunit</artifactId>
            <version>1.1.0</version>
            <classifier>hadoop2</classifier>
            <scope>test</scope>
        </dependency>

配置成功后，我们会在项目中看到如下的文件夹，我们需要在该文件夹下面添加我们的test

mapreduce自定义输出文件名 mapreduce没有输出文件_mrUnit

三、编写代码（代码详细解释，在前面一篇文章）

（1）Mapper代码

import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

import java.io.IOException;


public class FriendCountMapper extends Mapper<LongWritable, Text,Text, FriendCountBean> {
    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        String data = value.toString();
        String[] userAndFriends = data.split("\t");
        //获得用户和该用户的朋友们
        if (userAndFriends.length != 2){
            System.err.println(key.toString()+"行不成功");
            return;
        }
        String user = userAndFriends[0];
        String[] friends = userAndFriends[1].split(",");
        for (String friend:friends) {
            context.write(new Text(user), new FriendCountBean(friend, true));
            context.write(new Text(friend), new FriendCountBean(user, true));
        }
        for (int i = 0;i < friends.length; i++){
            for (int j = i+1 ;j < friends.length; j++){
                context.write(new Text(friends[i]), new FriendCountBean(friends[j], false));
                context.write(new Text(friends[j]), new FriendCountBean(friends[i], false));
            }
        }
    }
}

（2）FriendCountBean(这个不需要怎么看)

import org.apache.hadoop.io.Writable;

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;


public class FriendCountBean implements Writable {
    String name;
    boolean isFriend;

    public FriendCountBean(String name, boolean isFriend) {
        this.name = name;
        this.isFriend = isFriend;
    }

    public FriendCountBean() {
    }

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (o == null || getClass() != o.getClass()) return false;
        FriendCountBean that = (FriendCountBean) o;
        return isFriend == that.isFriend &&
                name.equals(that.name);
    }


    public String getName() {
        return name;
    }

    public boolean isFriend() {
        return isFriend;
    }

    public void write(DataOutput dataOutput) throws IOException {
        dataOutput.writeUTF(name);
        dataOutput.writeBoolean(isFriend);
    }

    public void readFields(DataInput dataInput) throws IOException {
        this.name = dataInput.readUTF();
        this.isFriend = dataInput.readBoolean();
    }
}

注意：如果想与输出结果进行对比的话，自定义的类，一定要写equals方法和hashCode方法。

（3）Test代码（代码解释在注释中）

import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mrunit.mapreduce.MapDriver;
import org.junit.Before;
import org.junit.Test;

import java.io.*;

public class FriendCountTest {
    //mrunit有个好处就是，我们只需要测试map,reduce或者mapReduce，我们只需要建立相关的Driver即可
    //这里我只想测试map，因此我只创建它的Driver
    private MapDriver<LongWritable, Text,Text, FriendCountBean> mapDriver;

    //这个是将我们类指定到Driver中，一定要写注释Before
    @Before
    public void setUp() throws Exception {
        mapDriver = MapDriver.newMapDriver(new FriendCountMapper());
    }

    //真正的测试函数，我们运行时候，需要在这个方法体内点右击，然后再Run，因为我们没有写main函数，所以直接运行类的话，没有函数入口
    @Test
    public void testMap() {
        //要测试的数据，存在这个txt中，最好不要用绝对路径（我只是图个方便）
        File dataFile = new File("C:\\Users\\admin\\IdeaProjects\\hadoop-bootstrap\\src\\test\\data\\social_data.txt");
        BufferedReader reader = null;
        Long count = 1L;
        try {
            //正常的文件读取顺序
            reader = new BufferedReader(new FileReader(dataFile));
            String line = null;
            while ((line = reader.readLine()) != null){ //将我们的测试用例一直添加进去
                //这里注意，一定要和我们创建的Mapper对象的输入一致，前两个参数
                //同理，如果我们写withOutput，也要和Mapper的后两个参数一致
                mapDriver.withInput(new LongWritable(count), new Text(line));
                count++;
            }
            try{
                //这个test是以多线程的形式跑的，因此，我们只需要在这里运行一次，他就会自动调用输入，并运行
                mapDriver.runTest();
            } catch (Throwable throwable){
                //由于我们没有写Output，因此程序会报error，所有的error都来自于Throwable
            }
        }catch (IOException e){
            e.printStackTrace();
        }
    }
}

然后，我们可以通过打断点的形式，对程序进行调试，就像java程序一样。

但是，最终运行了这个程序，并不会报相关的错误，因此觉得有可能是由于在hadoop集群中的文件编码与windows的编码格式不太一样，导致了相应的错误，只需要修改相应编码即可。

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。