IT~技术交流群
添加微信号:liudd666haha
备注进群,会拉进交流群
互帮互助,IT之路不孤独!
本篇来源: https://liudongdong.top/archives/hiveshi-liu-hive-zhi-zi-ding-yi-udtf-han-shu
一、自定义 UDTF 函数
需求自定义一个 UDTF 实现将一个任意分割符的字符串切割成独立的单词,例如:
hive(default)> select myudtf("hello,world,hadoop,hive", ",");
hello
world
hadoop
hive代码实现
package com.learn.hive;
import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
import org.apache.hadoop.hive.ql.metadata.HiveException;
import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;
import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
import java.util.ArrayList;
import java.util.List;
public class MyWordSplit extends GenericUDTF {
List<String> list = new ArrayList<String>();
/**
* 初始化处理输入和输出
* @param argOIs
* @return
* @throws UDFArgumentException
*/
@Override
public StructObjectInspector initialize(StructObjectInspector argOIs) throws UDFArgumentException { List<String> fieldNames = new ArrayList<String>();
List<ObjectInspector> fieldOIs = new ArrayList<ObjectInspector>();
fieldNames.add("word");
fieldOIs.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector);
return ObjectInspectorFactory.getStandardStructObjectInspector(fieldNames, fieldOIs);
}
public void process(Object[] args) throws HiveException {
if (null == args || args.length != 2 || null == args[0] || null == args[1]) {
throw new HiveException();
} String word = args[0].toString();
String splitKey = args[1].toString();
String[] split = word.split(splitKey);
for (String wd : split) {
list.clear();
list.add(wd);
forward(list);
}
}
public void close() throws HiveException {
}
}打成 jar 包上传到服务器
/opt/module/hive/datas/myudtf.jar
将 jar 包添加到 hive 的 classpath 下
hive (default)> add jar /opt/module/hive/datas/myudtf.jar;
创建临时函数与开发好的 java class 关联
hive (default)> create temporary function myudtf as "com.learn.hive.MyWordSplit";
使用自定义的函数
hive (default)> select myudtf("hello,world,hadoop,hive",",");