环境准备


hadoop2.7+centos7+hive1.2.1+VirtualBox+xshll+eclipse+jdk1.8

数据准备:

  1. 启动hadoop集群和hive:
# start-dfs.sh
#source /etc/profile (注:本人集群搭建应该有问题,每次启动hive时都得先运行一下这个命令)
#hive

hive入门UDF之星座计算(根据hive编程指南)_lua

  1. 建立表格:
create table littlebigdata(
name string,
email string,
bday string,
ip string,
gender string,
anum int)
row format delimited fields terminated by ',';
  1. 导入数据:
load data local inpath '/root/data/data6' into table littlebigdata;

“/root/data/data6”修改成自己要上传数据的路径

hive入门UDF之星座计算(根据hive编程指南)_hive_02

hive入门UDF之星座计算(根据hive编程指南)_hive_03
数据到这里就准备好了,接下来就是编写自己的UDF了

编写UDF:

打开Eclipse

package hivejar;

import java.text.SimpleDateFormat;
import java.util.Date;

import org.apache.hadoop.hive.ql.exec.Description;
import org.apache.hadoop.hive.ql.exec.UDF;


@Description(name="zodiac", value="_FUNC_(date)-from the input date string"+"or separate month and day arguments,returns the sign of the Zodiac",
    extended ="Example :\n"+"> SELECT _FUNC_(data_string) from src;\n"+ ">SELECT _FUNC_(mouth,day0) FORM src;")
public class Zodiac extends UDF {
    private SimpleDateFormat df;

    public Zodiac(){
        df=new SimpleDateFormat("MM-dd-yyyy");
    }

    public String evaluate(Date bday){
        return this.evaluate(bday.getMonth(),bday.getDay());
    }

    public String evaluate(String bday){
        Date date =null;
        try{
        date = df.parse(bday);
        }catch (Exception ex){
            return null;
        }
        return this.evaluate(date.getMonth()+1,date.getDay());
    }
    //在这里只写了两个月
    public String evaluate(Integer month,Integer day){
        if (month==1){
            if(day<20){return "Capricorn";
            }else{
                return "Aquarius";
            }
        }
        if(month==2){
            if(day<19){
                return "Aquarius";

            }else{
                return "Pisces";
            }
        }
        return null;
    }

}

然后打包成jar包,然后上传到虚拟机中,
hive入门UDF之星座计算(根据hive编程指南)_hive_04
其中 zodiac.jar就是打好的jar包
在hive会话中将这个jar文件加载到类路径下:

hive> add jar /root/data/zodiac.jar;
hive> create temporary function zodiac
hive> as 'hivejar.Zodiac';

路径根据自己实际情况修改,

describe function extended zodiac;

hive入门UDF之星座计算(根据hive编程指南)_lua_05
到这里就已经可以使用了

hive> select name ,bday,zodiac(bday) from littlebigdata;

hive入门UDF之星座计算(根据hive编程指南)_hive_06