下面是一个样本数据集,我们将其放到用户跟目录下一个名为littlebigdata.txt文件中:

edward capriolo,edward@media6degrees.com,1981-02-12,209.191.139.200,M,10
bob,bob@test.net,2004-10-10,10.10.10.1,M,50
sara connor,sara@sky.net,1974-05-04,64.64.5.1,F,2

将样本数据载入到名为littlebigdata的表中:

create table if not exists littlebigdata(
name string,
email string,
bday string,
ip string,
gender string,
anum int)
row format delimited fields terminated by ',';

load data local inpath '/root/littlebigdata.txt' into table littlebigdata;

编写我们Java类,根据一个日期,输出日期对应的星座字符串

package com.lyz.hadoop.hive.udf;
import org.apache.hadoop.hive.ql.exec.Description;
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.joda.time.DateTime;
import org.joda.time.format.DateTimeFormat;
import org.joda.time.format.DateTimeFormatter;

import java.util.Date;

/**
* @author liuyazhuang
* 此函数实现用户输入字符串格式为yyyy-MM-dd形式的日期,返回该用户的星座类型
*/
@Description(name = "zodiac_cn"
, value = "_FUNC_(date) - from the input date string or separate month and day arguments, returns the sing of the Zodiac."
, extended = "Example:\n > select _FUNC_(date_string) from src;\n > select _FUNC_(month, day) from src;")
public class UDFZodiacSignCn extends UDF {
//日期的输入格式固定为:yyyy-MM-dd
public final static DateTimeFormatter DEFAULT_DATE_FORMATTER = DateTimeFormat.forPattern("yyyy-MM-dd");

private Text result = new Text();

public UDFZodiacSignCn() {
}

public Text evaluate(Text birthday) {
DateTime dateTime = null;
try {
dateTime = DateTime.parse(birthday.toString(), DEFAULT_DATE_FORMATTER);
} catch (Exception e) {
return null;
}

return evaluate(dateTime.toDate());
}

public Text evaluate(Date birthday) {
DateTime dateTime = new DateTime(birthday);
return evaluate(new IntWritable(dateTime.getMonthOfYear()), new IntWritable(dateTime.getDayOfMonth()));
}

public Text evaluate(IntWritable month, IntWritable day) {
result.set(getZodiac(month.get(), day.get()));
return result;
}

private String getZodiac(int month, int day) {
String[] zodiacArray = {"魔羯座", "水瓶座", "双鱼座", "白羊座", "金牛座", "双子座", "巨蟹座", "狮子座",
"处女座", "天秤座", "天蝎座", "射手座"};
int[] splitDay = {19, 18, 20, 20, 20, 21, 22, 22, 22, 22, 21, 21}; // 两个星座分割日
int index = month;
// 所查询日期在分割日之前,索引-1,否则不变
if (day <= splitDay[month - 1]) {
index = index - 1;
} else if (month == 12) {
index = 0;
}
// 返回索引指向的星座string
return zodiacArray[index];
}

public static void main(String[] args) {
UDFZodiacSignCn udfZodiacSignCn = new UDFZodiacSignCn();
System.out.println("1990-11-02: "+udfZodiacSignCn.evaluate(new Text("1990-11-02")));
//错误格式的日期,返回值为null
System.out.println(udfZodiacSignCn.evaluate(new Text("19901102")));
System.out.println("2000-11-02: "+udfZodiacSignCn.evaluate(new Text("2000-11-02")));
System.out.println("2000-01-02: "+udfZodiacSignCn.evaluate(new Text("2000-01-02")));

}
}

注意:这里我新建的是Maven工程,依赖项如下:

<dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.9.2</version>
</dependency>

<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-exec</artifactId>
<version>3.1.1</version>
</dependency>
</dependencies>

编写好这个类后通过eclipse导入成udf.jar

Hive之——UDF函数:根据日期计算星座_hive

Hive之——UDF函数:根据日期计算星座_UDF_02

将这个jar上传到服务器的/usr/local/src目录下。

然后在Hive命令行中执行:

hive> add jar /usr/local/src/udf.jar
hive> create temporary function zodiac as 'com.lyz.hadoop.hive.udf.UDFZodiacSignCn';

注意:create function语句中的temporary关键字。当前会话中声明的函数只会在当前会话有效。因此用户需要在每个会话中都增加Jar然后创建函数。不过,如果用户频繁的使用同一个Jar文件和函数的话,可以将相关的语句增加到$HOME/.hiverc文件中。

hive> describe function zodiac;
OK
zodiac(date) - from the input date string or separate month and day arguments, returns the sing of the Zodiac.
Time taken: 0.017 seconds, Fetched: 1 row(s)

hive> describe function extended zodiac;
OK
zodiac(date) - from the input date string or separate month and day arguments, returns the sing of the Zodiac.
Example:
> select zodiac(date_string) from src;
> select zodiac(month, day) from src;
Function class:com.lyz.hadoop.hive.udf.UDFZodiacSignCn
Function type:TEMPORARY
Time taken: 0.017 seconds, Fetched: 6 row(s)

hive> select name, bday, zodiac(bday) from littlebigdata;
OK
edward capriolo 1981-02-12 水瓶座
bob 2004-10-10 天秤座
sara connor 1974-05-04 金牛座
Time taken: 0.137 seconds, Fetched: 3 row(s)

当使用完自定义UDF后,可以通过下面的命令删除函数:

hive> drop temporary function if exists zodiac;