IT~技术交流群

添加微信号:liudd666haha

备注进群,会拉进交流群

互帮互助,IT之路不孤独!


本篇来源:​​  https://liudongdong.top/archives/hiveshi-liu-hive-zhi-zi-ding-yi-udtf-han-shu​

本系列来源:​  https://liudongdong.top/categories/hive​



一、自定义 UDTF 函数

  1. 需求自定义一个 UDTF 实现将一个任意分割符的字符串切割成独立的单词,例如:

    hive(default)> select myudtf("hello,world,hadoop,hive", ",");
    hello
    world
    hadoop
    hive

  2. 代码实现

    package com.learn.hive;

    import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
    import org.apache.hadoop.hive.ql.metadata.HiveException;
    import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;
    import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
    import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;
    import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
    import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;

    import java.util.ArrayList;
    import java.util.List;

    public class MyWordSplit extends GenericUDTF {

    List<String> list = new ArrayList<String>();

    /**
    * 初始化处理输入和输出
    * @param argOIs
    * @return
    * @throws UDFArgumentException
    */
    @Override
    public StructObjectInspector initialize(StructObjectInspector argOIs) throws UDFArgumentException { List<String> fieldNames = new ArrayList<String>();
    List<ObjectInspector> fieldOIs = new ArrayList<ObjectInspector>();

    fieldNames.add("word");
    fieldOIs.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector);

    return ObjectInspectorFactory.getStandardStructObjectInspector(fieldNames, fieldOIs);
    }

    public void process(Object[] args) throws HiveException {
    if (null == args || args.length != 2 || null == args[0] || null == args[1]) {
    throw new HiveException();
    } String word = args[0].toString();
    String splitKey = args[1].toString();

    String[] split = word.split(splitKey);
    for (String wd : split) {
    list.clear();
    list.add(wd);
    forward(list);
    }
    }

    public void close() throws HiveException {

    }
    }

  3. 打成 jar 包上传到服务器

    /opt/module/hive/datas/myudtf.jar
  4. 将 jar 包添加到 hive 的 classpath 下

    hive (default)> add jar /opt/module/hive/datas/myudtf.jar;
  5. 创建临时函数与开发好的 java class 关联

    hive (default)> create temporary function myudtf as  "com.learn.hive.MyWordSplit";
  6. 使用自定义的函数

    hive (default)> select myudtf("hello,world,hadoop,hive",",");