hive 看function使用 hive invalid function replace

转载

jowvid 2023-07-20 18:34:26

文章标签 hive 看function使用 Hive hive bc h5 文章分类 Hive 大数据

hive自定义函数

1、自定义函数的基本介绍

Hive 自带了一些函数，比如：max/min等，但是数量有限，自己可以通过自定义UDF来方便的扩展。
当Hive提供的内置函数无法满足你的业务处理需要时，此时就可以考虑使用用户自定义函数（UDF：user-defined function）
根据用户自定义函数类别分为以下三种：

UDF（User-Defined-Function）一进一出
UDAF（User-Defined Aggregation Function）聚集函数，多进一出，类似于：count/max/min
UDTF（User-Defined Table-Generating Functions）一进多出，如lateral view explode()

如lateral view explode()

官方文档地址
https://cwiki.apache.org/confluence/display/Hive/HivePlugins
编程步骤：

（1）继承org.apache.hadoop.hive.ql.UDF

（2）需要实现evaluate函数；evaluate函数支持重载；

注意事项

（1）UDF必须要有返回类型，可以返回null，但是返回类型不能为void；

（2）UDF中常用Text/LongWritable等类型，不推荐使用java类型；

2、自定义函数开发

第一步：创建maven java 工程，并导入jar包

<repositories>
        <repository>
            <id>cloudera</id>
            <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
        </repository>
    </repositories>
    
    <dependencies>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>2.6.0-mr1-cdh5.14.2</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>2.6.0-cdh5.14.2</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-hdfs</artifactId>
            <version>2.6.0-cdh5.14.2</version>
        </dependency>

        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-mapreduce-client-core</artifactId>
            <version>2.6.0-cdh5.14.2</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hive</groupId>
            <artifactId>hive-exec</artifactId>
            <version>1.1.0-cdh5.14.2</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hive</groupId>
            <artifactId>hive-jdbc</artifactId>
            <version>1.1.0-cdh5.14.2</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hive</groupId>
            <artifactId>hive-cli</artifactId>
            <version>1.1.0-cdh5.14.2</version>
        </dependency>
    </dependencies>
    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.0</version>
                <configuration>
                    <source>1.8</source>
                    <target>1.8</target>
                    <encoding>UTF-8</encoding>
                    <!--    <verbal>true</verbal>-->
                </configuration>
            </plugin>
        </plugins>
    </build>

第二步：开发java类继承UDF，并重载evaluate 方法

package com.kkb.udf.MyUDF;

public class MyUDF extends UDF {
     public Text evaluate(final Text s) {
         if (null == s) {
             return null;
         }
         //返回大写字母         
         return new Text(s.toString().toUpperCase());
     }
 }

第三步：项目打包

使用maven的package进行打包
将我们打包好的jar包上传到node03服务器的/kkb/install/hive-1.1.0-cdh5.14.2/lib 这个路径下

第四步：添加我们的jar包

重命名我们的jar包名称

cd /kkb/install/hive-1.1.0-cdh5.14.2/lib
mv original-day_hive_udf-1.0-SNAPSHOT.jar udf.jar

hive的客户端添加我们的jar包

0: jdbc:hive2://node03:10000> add jar /kkb/install/hive-1.1.0-cdh5.14.2/lib/udf.jar;

第五步：设置函数与我们的自定义函数关联

0: jdbc:hive2://node03:10000> create temporary function touppercase as 'com.kkb.udf.MyUDF';

第六步：使用自定义函数

0: jdbc:hive2://node03:10000>select tolowercase('abc');

hive当中如何创建永久函数
在hive当中添加临时函数，需要我们每次进入hive客户端的时候都需要添加以下，退出hive客户端临时函数就会失效，那么我们也可以创建永久函数来让其不会失效
创建永久函数

-- 1、指定数据库，将我们的函数创建到指定的数据库下面
0: jdbc:hive2://node03:10000>use myhive;

-- 2、使用add jar添加我们的jar包到hive当中来
0: jdbc:hive2://node03:10000>add jar /kkb/install/hive-1.1.0-cdh5.14.2/lib/udf.jar;

-- 3、查看我们添加的所有的jar包
0: jdbc:hive2://node03:10000>list jars;

-- 4、创建永久函数，与我们的函数进行关联
0: jdbc:hive2://node03:10000>create function myuppercase as 'com.kkb.udf.MyUDF';

-- 5、查看我们的永久函数
0: jdbc:hive2://node03:10000>show functions like 'my*';

-- 6、使用永久函数
0: jdbc:hive2://node03:10000>select myhive.myuppercase('helloworld');

-- 7、删除永久函数
0: jdbc:hive2://node03:10000>drop function myhive.myuppercase;

-- 8、查看函数
 show functions like 'my*';

Json数据解析UDF开发（作业）

有原始json数据格式内容如下：

{"movie":"1193","rate":"5","timeStamp":"978300760","uid":"1"} 
{"movie":"661","rate":"3","timeStamp":"978302109","uid":"1"}
{"movie":"914","rate":"3","timeStamp":"978301968","uid":"1"}
{"movie":"3408","rate":"4","timeStamp":"978300275","uid":"1"}
{"movie":"2355","rate":"5","timeStamp":"978824291","uid":"1"}
{"movie":"1197","rate":"3","timeStamp":"978302268","uid":"1"}
{"movie":"1287","rate":"5","timeStamp":"978302039","uid":"1"}