序列化分析:

序列化和反序列化就是结构化对象和字节流之间的转换,主要用在内部进程的通讯和持久化存储方面。
hadoop在节点间的内部通讯使用的是RPC,RPC协议把消息翻译成二进制字节流发送到远程节点,远程节点再通过反序列化把二进制流转成原始的信息。RPC的序列化需要实现以下几点:
1.压缩,可以起到压缩的效果,占用的宽带资源要小
2.快速,内部进程为分布式系统构建了高速链路,因此在序列化和反序列化间必须是快速的
3.可扩展的,新的服务端为新的客户端增加了一个参数,老客户端照样可以使用
4.兼容性好,可以支持多个语言的客户端

hadoop自身的序列化存储格式就是实现了Writabl·e接口的类,
其只实现了前面两点,压缩和快速。但是不容易扩展,也不跨语言

Writable类层次结构

hadoop外部表 chmod hadoop writable_hadoop


但是这些有时并不能满足我们的需求,这时我们就需要自定义Writable来实现某个javabean的可序列化了。

接下来我们举一个扑克牌的例子

问题描述:

hadoop外部表 chmod hadoop writable_Text_02


假如3张人头牌(扑克中J,Q,K)从扑克中拿走. 怎么通过MapReduce找到哪些花色缺牌?

输入文件:(red红桃rect方块black黑桃flower梅花)

red-2
red-3
red-4
red-5
red-6
red-7
red-8
red-9
red-10
red-11
red-13

rect-1
rect-2
rect-3
rect-4
rect-5
rect-6
rect-7
rect-8
rect-9
rect-10
rect-11
rect-12

black-1
black-2
black-3
black-4
black-5
black-6
black-7
black-8
black-9
black-10
black-11
black-12
black-13

flower-1
flower-2
flower-3
flower-4
flower-5
flower-6
flower-7
flower-8
flower-9
flower-10
flower-12
flower-13

JavaBean(CardBean)为:

package SerializableTest;

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;

import org.apache.hadoop.io.Writable;

public class CardBean implements Writable{
    private String kind;
    private int number;
    public String getKind() {
        return kind;
    }
    public void setKind(String kind) {
        this.kind = kind;
    }
    public int getNumber() {
        return number;
    }
    public void setNumber(int number) {
        this.number = number;
    }
    @Override
    public void write(DataOutput out) throws IOException {
        out.writeUTF(kind);
        out.writeInt(number);
    }
    @Override
    public void readFields(DataInput in) throws IOException {
        //读和写的顺序要保持一致
        kind = in.readUTF();
        number = in.readInt();
    }

}

因为是CardBean是放在value中的,所以满足Writable就可以了

Mapper:

package SerializableTest;

import java.io.IOException;

import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class PokerMapper extends Mapper<LongWritable, Text, Text, CardBean>{
    @Override
    protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, CardBean>.Context context)
            throws IOException, InterruptedException {
        String line = value.toString();
        String[] strs = line.split("-");
        if(strs.length==2){
            CardBean cardBean = new CardBean();
            cardBean.setKind(strs[0]);
            cardBean.setNumber( Integer.valueOf(strs[1]));
            if(cardBean.getNumber()>10){
                //大于10的花牌,需要转到reduce里
                context.write(new Text(strs[0]), cardBean);
            }
        }
    }
}

Reducer:

package SerializableTest;

import java.io.IOException;

import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class PokerReduce extends Reducer<Text, CardBean, Text, LongWritable>{
    @Override
    protected void reduce(Text key, Iterable<CardBean> iter,
            Reducer<Text, CardBean, Text, LongWritable>.Context context) throws IOException, InterruptedException {
            int count = 0;
            while(iter.iterator().hasNext()){
                iter.iterator().next();
                count++;
            }
            if(count < 3){
                context.write(key, new LongWritable(count));
            }
    }
}

Runner:

package SerializableTest;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.Job;

public class PokerRunner {

    public static void main(String[] args) {
        // TODO Auto-generated method stub
        if (args.length!=2){  
            System.out.println("Usage: Serializable Writable");  
            System.exit(-1);  
        } 
        Configuration conf=new Configuration();
try {
    Job job=Job.getInstance(conf);
    //getTnstance()保证一个类仅有一个实例,并提供一个访问它的全局访问点

    //指定runner mapper reducer 的执行类
    job.setJarByClass(PokerRunner.class);
    job.setMapperClass(PokerMapper.class);
    job.setReducerClass(PokerReduce.class);
    //设置map中key value 的输出类型
    job.setMapOutputKeyClass(Text.class);
    job.setMapOutputValueClass(CardBean.class);
    //设置reduce中key value 的输出类型
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(LongWritable.class);    

    FileInputFormat.setInputPaths(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job,new Path(args[1]));

    job.waitForCompletion(true);
} catch (Exception e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
}
    }

}

输出文件:

red 2
rect 2
flower 2