hive max函数用法 hive函数的使用

转载

代码工匠传奇 2023-07-08 15:53:07

文章标签 hive max函数用法字符串 java hive函数值类型 文章分类 Hive 大数据

（1）concat()
（2）concat_ws()
（3）instr(string,substr)
（4）length(string)
（5）locate(substr,str,pos)
（6）lower(string) /upper(string)
（7）regexp_replace(x,y,z)
（8）split(string,x)
（9）substr(string,x,y)/substring(string,x,y)
（10）trim(string)
（11）str_to_map(text[, delimiter1, delimiter2])

2.2、类型转换函数

（1）cast(expr as )
（2）binary(string)

2.3、数学函数

（1）round(x,y)
（2）ceil(x)
（3）floor(x)
（4）rand(x)
（5）power(x,n)
（6）abs(x)

2.4、日期函数

（1）from_unixtime(unixtime,"yyyy-MM-dd HH:mm:ss")
（2）unix_timestamp("yyyy-MM-dd HH:mm:ss")
（3）to_date("yyyy-MM-dd HH:mm:ss")
（4）year("yyyy-MM-dd HH:mm:ss")
（5）month("yyyy-MM-dd HH:mm:ss")
（6）day("yyyy-MM-dd HH:mm:ss")
（7）hour("yyyy-MM-dd HH:mm:ss")
（8）minute("yyyy-MM-dd HH:mm:ss")
（9）second("yyyy-MM-dd HH:mm:ss")
（10）weekofyear("yyyy-MM-dd HH:mm:ss")
（11）datediff(enddate,startdate)
（12）date_add(startdate,days)
（13）date_sub(startdate,days)
（14）current_date()
（15）current_timestamp
（16）date_format("yyyy-MM-dd HH:mm:ss","MM-dd HH:mm:ss")

2.5、集合函数

（1）size(map)
（2）size(array)
（3）map_keys(map)
（4）map_values(map)
（5）array_contains(array, value)
（6）sort_array(array)

2.6、条件函数

（1）if(x, y, z)
（2）case when
（3）isnull
（4）isnotnull
（5）nvl(x,y)
（6）coalesce

2.7、聚合函数
2.8、表生成函数

（1）explode(array)
（2）explode(map)
（3）posexplode(array)
（4）stack(int n, v_1, v_2, ..., v_k)

3、自定义函数

3.1、Hive UDF函数

（1）开发流程
（2）UDF的实现

4、窗口函数

1、Hive函数分类

1.1、从输入输出角度分类

标准函数：一行数据中的一列或多列为输入，结果为单一值
聚合函数：多行的零列到多列为输入，结果为单一值
表生成函数：零个或多个输入，结果为多列或多行

1.2、从实现方式分类

内置函数

字符函数
类型转换函数
数学函数
日期函数
集合函数
条件函数
聚合函数
表生成函数

自定义函数

UDF：自定义标准函数
UDAF：自定义聚合函数
UDTF：自定义表生成函数

2、内置函数

2.1、字符函数

concat	concat_ws
instr	length
locate	regexp_replace
lower	upper
split	substr
trim	str_to_map

（1）concat()

用于拼接字符串，每个字符串之间都需要添加指定分隔符
返回值类型：string

示例：
拼接表中的 name 和 dept_num 字段，并添加一个 name 列

select concat("name",":",name,"--",dept_num) as con from employee_contract;

运行结果如下：

+---------------------+--+
|         con         |
+---------------------+--+
| name:Michael--1000  |
| name:Will--1000     |
| name:Wendy--1000    |
| name:Steven--1000   |
| name:Lucy--1000     |
| name:Lily--1001     |
| name:Jess--1001     |
| name:Mike--1001     |
| name:Wei--1002      |
| name:Yun--1002      |
| name:Richard--1002  |
+---------------------+--+

（2）concat_ws()

拼接字符串，第一个参数为分隔符，其后都为需要拼接的字符串
返回值类型：string

示例：
多行转一行，查找出每个部门的员工姓名

select dept_num,
concat_ws(",",collect_list(name)) as con_name
from employee_contract
group by dept_num;

运行结果如下：

+-----------+---------------------------------+--+
| dept_num  |            con_name             |
+-----------+---------------------------------+--+
| 1000      | Michael,Will,Wendy,Steven,Lucy  |
| 1001      | Lily,Jess,Mike                  |
| 1002      | Wei,Yun,Richard                 |
+-----------+---------------------------------+--+

（3）instr(string,substr)

返回子字符串 substr 在 string 中的位置，若不存在则返回 0
返回值类型：int

示例：
判断一个字符串是否在这一列中存在，查找 employee 表中 name 字段中含有 i 的信息

select * from employee where instr(name,"i")>0;

运行结果如下：

+----------------+-------------------------+--------------------------+------------------------+---------------------------------------+--+
| employee.name  |   employee.work_place   |     employee.sex_age     | employee.skills_score  |         employee.depart_title         |
+----------------+-------------------------+--------------------------+------------------------+---------------------------------------+--+
| Michael        | ["Montreal","Toronto"]  | {"sex":"Male","age":30}  | {"DB":80}              | {"Product":["Developer","Lead"]}      |
| Will           | ["Montreal"]            | {"sex":"Male","age":35}  | {"Perl":85}            | {"Product":["Lead"],"Test":["Lead"]}  |
+----------------+-------------------------+--------------------------+------------------------+---------------------------------------+--+

（4）length(string)

返回字符串长度
返回值类型：int

示例：
返回字符串长度

select length("abcdefg");

运行结果如下：

+------+--+
| _c0  |
+------+--+
| 7    |
+------+--+

（5）locate(substr,str,pos)

查找字符串 str 中的 pos 位置后字符串 substr 第一次出现的位置
返回值类型：int

示例：
查找abc:hello word a b c中第四个字符字以后a的位置

select locate("a","abc:hello word a b c",4);

运行结果如下：

+------+--+
| _c0  |
+------+--+
| 16   |
+------+--+

（6）lower(string) /upper(string)

将字符串中的所有字母转换成小写/大写字母
返回值类型：string

示例：

select upper("Hello word");
select lower("Hello WORD");

运行结果如下：

+-------------+--+
|     _c0     |
+-------------+--+
| HELLO WORD  |
+-------------+--+

+-------------+--+
|     _c0     |
+-------------+--+
| hello word  |
+-------------+--+

（7）regexp_replace(x,y,z)

替换字符串中的指定字段
第一个参数 x 为字符串
第二种参数 y 为需要替换的字符
第三个参数 z 为替换成的字符
返回值类型：string

示例：
将字符串中的一个或多个空格替换为，

select regexp_replace("hello word I Love you","\\s",",");

运行结果如下：

+------------------------+--+
|          _c0           |
+------------------------+--+
| hello,word,I,Love,you  |
+------------------------+--+

（8）split(string,x)

以指定的 x 为分隔符切割字符串
返回值类型：array

示例：
按逗号切割自定义字符串

select split("hello word,I Love you,I'am a student",",");

运行结果如下：

+-----------------------------------------------+--+
|                      _c0                      |
+-----------------------------------------------+--+
| ["hello word","I Love you","I'am a student"]  |
+-----------------------------------------------+--+

（9）substr(string,x,y)/substring(string,x,y)

截取字符串
从第 x 位开始，截取长度为 y 的字符串
返回值类型：string

示例：

select substr("2020-12-14 10:17:30",0,7);
select substring("2020-12-14 10:17:30",0,7);

运行结果如下：

+----------+--+
|   _c0    |
+----------+--+
| 2020-12  |
+----------+--+

（10）trim(string)

去掉字符串前后的空格
返回值类型：string

示例：

select trim("    abcds   ");

运行结果如下：

+--------+--+
|  _c0   |
+--------+--+
| abcds  |
+--------+--+

（11）str_to_map(text[, delimiter1, delimiter2])

将字符串str按照指定分隔符转换成Map
返回值类型：map

示例：

select str_to_map("name:ZS,age:24",",");

运行结果如下：

+---------------------------+--+
|            _c0            |
+---------------------------+--+
| {"name":"ZS","age":"24"}  |
+---------------------------+--+

2.2、类型转换函数

（1）cast(expr as )

将 expr 转换成 type 类型
返回值类型：type

示例：

select cast("1.02" as int);

运行结果如下：

+------+--+
| _c0  |
+------+--+
| 1    |
+------+--+

（2）binary(string)

将输入的值转换成二进制
返回值类型：binary

示例：

select binary('8');

运行结果如下：

+------+--+
| _c0  |
+------+--+
| 8    |
+------+--+

2.3、数学函数

round	ceil
floor	rand
power	abs

（1）round(x,y)

对 x 进行四舍五入取整，保留 y 位小数，不写 y 表示不保留小数
返回值类型：double/binary

示例：

select round(5.55);

运行结果如下：

+------+--+
| _c0  |
+------+--+
| 6.0  |
+------+--+

（2）ceil(x)

小数向上取整
返回值类型：bigint

示例：

select ceil(1.2);

运行结果如下：

+------+--+
| _c0  |
+------+--+
| 2    |
+------+--+

（3）floor(x)

小数向下取整
返回值类型：bigint

示例：

select floor(5.55);

运行结果如下：

+------+--+
| _c0  |
+------+--+
| 5    |
+------+--+

（4）rand(x)

返回一个 0 到 1 的 double 型随机数
x是随机因子，指定随机因子x，则会等到一个稳定的随机数序列
返回值类型：double

示例1：

select rand();

运行结果如下：

+---------------------+--+
|         _c0         |
+---------------------+--+
| 0.8360579048475161  |
+---------------------+--+

示例2：
求一个 1 到 10 的随机数

select round(rand()*10);

运行结果如下：

+------+--+
| _c0  |
+------+--+
| 5.0  |
+------+--+

（5）power(x,n)

求指定 x 的 n 次方
返回值类型：double

示例：

select power(10,3);

运行结果如下：

+---------+--+
|   _c0   |
+---------+--+
| 1000.0  |
+---------+--+

（6）abs(x)

求 x 的绝对值
返回值类型：double

示例：

select abs(-9.250);
select abs(9.250);

运行结果如下：

+-------+--+
|  _c0  |
+-------+--+
| 9.25  |
+-------+--+

2.4、日期函数

from_unixtime	unix_timestamp
to_date	year
month	day
hour	minute
second	weekofyear
datediff	date_add
date_sub	current_date
current_timestamp	date_format

（1）from_unixtime(unixtime,“yyyy-MM-dd HH:mm:ss”)

将时间戳转换成format格式，时间格式可选，不选则为默认时间格式
返回值类型：string

示例：
时间戳转化为时间，使用指定的时间格式

select from_unixtime(1607914665,"yyyy-MM-dd");

运行结果如下：

+-------------+--+
|     _c0     |
+-------------+--+
| 2020-12-14  |
+-------------+--+

（2）unix_timestamp(“yyyy-MM-dd HH:mm:ss”)

将格式为"yyyy-MM-dd HH:mm:ss"的时间字符串转换成时间戳
无参数时输系统当前时间的时间戳
返回值类型：int/bigint

示例：

select unix_timestamp();

运行结果如下：

+-------------+--+
|     _c0     |
+-------------+--+
| 1607931757  |
+-------------+--+

（3）to_date(“yyyy-MM-dd HH:mm:ss”)

返回时间字符串的日期部分
返回值类型：string

示例：

select to_date("2020-12-14 11:29:59");

运行结果如下：

+-------------+--+
|     _c0     |
+-------------+--+
| 2020-12-14  |
+-------------+--+

（4）year(“yyyy-MM-dd HH:mm:ss”)

返回时间字符串中的年份
返回值类型：int

示例：

select year("2020-12-14 11:29:59");

运行结果如下：

+-------+--+
|  _c0  |
+-------+--+
| 2020  |
+-------+--+

（5）month(“yyyy-MM-dd HH:mm:ss”)

返回时间字符串中的月份
返回值类型：int

示例：

select month("2020-12-14 11:29:59");

运行结果如下：

+------+--+
| _c0  |
+------+--+
| 12   |
+------+--+

（6）day(“yyyy-MM-dd HH:mm:ss”)

返回时间字符串中的日期（号数）
返回值类型：int

示例：

select day("2020-12-14 11:29:59");

运行结果如下：

+------+--+
| _c0  |
+------+--+
| 14   |
+------+--+

（7）hour(“yyyy-MM-dd HH:mm:ss”)

返回时间字符串中的小时数
返回值类型：int

示例：

select hour("2020-12-14 11:29:59");

运行结果如下：

+------+--+
| _c0  |
+------+--+
| 11   |
+------+--+

（8）minute(“yyyy-MM-dd HH:mm:ss”)

返回时间字符串中的分钟数
返回值类型：int

示例：

select minute("2020-12-14 11:29:59");

运行结果如下：

+------+--+
| _c0  |
+------+--+
| 29   |
+------+--+

（9）second(“yyyy-MM-dd HH:mm:ss”)

返回时间字符串中的秒数
返回值类型：int

示例：

select second("2020-12-14 11:29:59");

运行结果如下：

+------+--+
| _c0  |
+------+--+
| 59   |
+------+--+

（10）weekofyear(“yyyy-MM-dd HH:mm:ss”)

返回时间在当年是第几周
返回值类型：int

示例：

select weekofyear("2020-12-14 11:29:59");

运行结果如下：

+------+--+
| _c0  |
+------+--+
| 51   |
+------+--+

（11）datediff(enddate,startdate)

计算开始时间到结束时间相差的天数
返回值类型：int

示例：

select datediff("2020-12-14","2020-12-01");

运行结果如下：

+------+--+
| _c0  |
+------+--+
| 13   |
+------+--+

（12）date_add(startdate,days)

从开始时间 startdate 加上 days 后的日期
返回值类型：string

示例：

select date_add("2020-12-14 11:45:58",3);

运行结果如下：

+-------------+--+
|     _c0     |
+-------------+--+
| 2020-12-17  |
+-------------+--+

（13）date_sub(startdate,days)

从开始时间 startdate 减去 days 后的日期
返回值类型：string

示例：

select date_sub("2020-12-14 11:45:58",3);

运行结果如下：

+-------------+--+
|     _c0     |
+-------------+--+
| 2020-12-11  |
+-------------+--+

（14）current_date()

返回当前时间的日期
返回值类型：date

示例：

select current_date();

运行结果如下：

+-------------+--+
|     _c0     |
+-------------+--+
| 2020-12-14  |
+-------------+--+

（15）current_timestamp

返回当前时间，精确到毫秒
返回值类型：timestamp

示例：

select current_timestamp();

运行结果如下：

+--------------------------+--+
|           _c0            |
+--------------------------+--+
| 2020-12-14 11:44:58.431  |
+--------------------------+--+

（16）date_format(“yyyy-MM-dd HH:mm:ss”,“MM-dd HH:mm:ss”)

按指定格式返回时间date
返回值类型：string

示例：

select date_format("2020-12-14 11:45:58","MM-dd HH:mm:ss");

运行结果如下：

+-----------------+--+
|       _c0       |
+-----------------+--+
| 12-14 11:45:58  |
+-----------------+--+

2.5、集合函数

size(Map<K,V>)	size(Array)
map_keys(Map<K,V>)	map_values(Map<K,V>)
array_contains(Array, value)	sort_array(Array)

（1）size(map)

返回 map 中键值对个数
返回值类型：int

示例：

select size((map("ZS",25,"LS",22,"WW",36)));

运行结果如下：

+------+--+
| _c0  |
+------+--+
| 3    |
+------+--+

（2）size(array)

返回数组的元素数量
返回值类型：int

示例：

select size(array(1,2,3,4,5));

运行结果如下：

+------+--+
| _c0  |
+------+--+
| 5    |
+------+--+

（3）map_keys(map)

返回 map 中的所有 key
返回值类型：array<k>

示例：

select map_keys((map("ZS",25,"LS",22,"WW",36)));

运行结果如下：

+-------------------+--+
|        _c0        |
+-------------------+--+
| ["ZS","LS","WW"]  |
+-------------------+--+

（4）map_values(map)

返回 map 中的所有 values
返回值类型：array<V>

示例：

select map_values((map("ZS",25,"LS",22,"WW",36)));

运行结果如下：

+-------------+--+
|     _c0     |
+-------------+--+
| [25,22,36]  |
+-------------+--+

（5）array_contains(array, value)

该数组 Array 中包含 value 返回 true，否则返回 false
返回值类型：boolean

示例：

select array_contains(array(1,2,34,5),2);

运行结果如下：

+-------+--+
|  _c0  |
+-------+--+
| true  |
+-------+--+

（6）sort_array(array)

对数组进行正序排序
返回值类型：array

示例：

select sort_array(array(1,2,5,4,2,3,8,9,6,7,2,1,0));

运行结果如下：

+------------------------------+--+
|             _c0              |
+------------------------------+--+
| [0,1,1,2,2,2,3,4,5,6,7,8,9]  |
+------------------------------+--+

2.6、条件函数

函数	说明
if(x, y, z)	如果条件 x 为结果 true 就返回 x，否则返回 y
case when	类似于 java 中的 switch
isnull(x)	如果 x 为 null 就返回 true，否则返回 false
isnotnull (x)	如果 x 为非 null 就返回 true，否则返回 false
nvl(x,y)	将 x 替换为 y
coalesce	返回第一非null的值，如果全部都为 null 就返回 null

（1）if(x, y, z)

条件判断输出
效果与三元表达式相同
第一个参数结果为 true，则返回第二个参数，反之则返回第三参数

示例：
工资分类
示例：

select name,
if(salary>5000,"高收入人群","低收入人群") as c
from employee_contract;

运行结果如下：

+----------+--------+--+
|   name   |   c    |
+----------+--------+--+
| Michael  | 低收入人群  |
| Will     | 低收入人群  |
| Wendy    | 低收入人群  |
| Steven   | 高收入人群  |
| Lucy     | 高收入人群  |
| Lily     | 低收入人群  |
| Jess     | 高收入人群  |
| Mike     | 高收入人群  |
| Wei      | 高收入人群  |
| Yun      | 高收入人群  |
| Richard  | 高收入人群  |
+----------+--------+--+

（2）case when

类似于 java 中的 switch 语句
用于多分支条件判断

示例：
– 薪资分级
– 薪资小于4000，低收入人群
– 薪资大于4000，小于等于6000，中收入人群
– 薪资大于6000，高收入人群

select name,salary,
(case when salary<=4000 then "低收入人群"
when salary>4000 and salary<6000 then "中收入人群"
else "高收入人群" end) as fenlei
from employee_contract;

运行结果如下：

+----------+---------+---------+--+
|   name   | salary  | fenlei  |
+----------+---------+---------+--+
| Michael  | 5000    | 中收入人群   |
| Will     | 4000    | 低收入人群   |
| Wendy    | 4000    | 低收入人群   |
| Steven   | 6400    | 高收入人群   |
| Lucy     | 5500    | 中收入人群   |
| Lily     | 5000    | 中收入人群   |
| Jess     | 6000    | 高收入人群   |
| Mike     | 6400    | 高收入人群   |
| Wei      | 7000    | 高收入人群   |
| Yun      | 5500    | 中收入人群   |
| Richard  | 8000    | 高收入人群   |
+----------+---------+---------+--+

（3）isnull

判断字符串是否为空
为空输出 true，否则输出 false
返回值类型：boolean

示例：

select isnull(null);
select isnull("null");	-- 注意：这表示一个null字符串，不为空，返回false

运行结果如下：

+-------+--+
|  _c0  |
+-------+--+
| true  |
+-------+--+

（4）isnotnull

与 isnull 相反
判断字符串是否不为空
不为空输出 true，否则输出 false
返回值类型：boolean

示例：

select isnotnull("null");

运行结果如下：

+-------+--+
|  _c0  |
+-------+--+
| true  |
+-------+--+

（5）nvl(x,y)

对 x 做处理，替换为y

示例：

select nvl(null,0)

运行结果如下：

+------+--+
| _c0  |
+------+--+
| 0    |
+------+--+

（6）coalesce

返回第一个不为空的值
若全部为空，则返回 null

示例：

select coalesce(null,null,"hello","word");

运行结果如下：

+--------+--+
|  _c0   |
+--------+--+
| hello  |
+--------+--+

2.7、聚合函数

函数	说明
count	计算结果行数
sum	求和
max	求最大值
min	求最小值
avg	求平均值

示例：
– 按部门分组
– 求员工人数、每个部门最高薪资、最低薪资、薪资总和以及平均薪资

select dept_num,
count(name) as num,
max(salary) as max_salary,
min(salary) as min_salary,
sum(salary) as sum_salary,
avg(salary) as avg_salary
from employee_contract
group by dept_num;

运行结果如下：

+-----------+------+-------------+-------------+-------------+--------------------+--+
| dept_num  | num  | max_salary  | min_salary  | sum_salary  |     avg_salary     |
+-----------+------+-------------+-------------+-------------+--------------------+--+
| 1000      | 5    | 6400        | 4000        | 24900       | 4980.0             |
| 1001      | 3    | 6400        | 5000        | 17400       | 5800.0             |
| 1002      | 3    | 8000        | 5500        | 20500       | 6833.333333333333  |
+-----------+------+-------------+-------------+-------------+--------------------+--+

2.8、表生成函数

函数	说明
explode(array)	对于 array 中的每个元素生成一行且包含该元素
explode(map)	每行对应每个map键值对，其中一个字段是map的键，另一个字段是map的值
posexplode(array)	与explode类似，不同的是还返回各元素在数组中的位置
json_tuple(jsonStr, k1, k2, …)	从一个 json 字符串中获取多个键并作为一个元组返回，与get_json_object不同的是此函数能一次获取多个键值

（1）explode(array)

对于 array 中的每个元素生成一行且包含该元素

示例：

select explode(array(1,2,3,4,5));

运行结果如下：

+------+--+
| col  |
+------+--+
| 1    |
| 2    |
| 3    |
| 4    |
| 5    |
+------+--+

（2）explode(map)

每行对应每个 map 键值对，其中一个字段是 map 的键，另一个字段是 map 的值

示例：

select explode(array(1,2,3,4,5));

运行结果如下：

+------+--+
| col  |
+------+--+
| 1    |
| 2    |
| 3    |
| 4    |
| 5    |
+------+--+

（3）posexplode(array)

与explode类似，不同的是还返回各元素在数组中的位置

示例：

select posexplode(array(4,9,8,7,6));

运行结果如下：

+------+------+--+
| pos  | val  |
+------+------+--+
| 0    | 4    |
| 1    | 9    |
| 2    | 8    |
| 3    | 7    |
| 4    | 6    |
+------+------+--+

（4）stack(int n, v_1, v_2, …, v_k)

把k列转换成 n 行，每行有 k/n 个字段，其中 n 必须是常数
注意：k 必须是 n 的倍数，k/n 为列数

示例：

select stack(1,"ZS","LS","WW")

运行结果如下：

+-------+-------+-------+--+
| col0  | col1  | col2  |
+-------+-------+-------+--+
| ZS    | LS    | WW    |
+-------+-------+-------+--+

3、自定义函数

3.1、Hive UDF函数

（1）开发流程

继承UDF类或GenericUDF类
重写evaluate()方法并实现函数逻辑
编译打包为jar文件
复制到正确的HDFS路径
使用jar创建临时/永久函数
调用函数

（2）UDF的实现

创建Maven项目
创建类继承UDF类，重写 evaluate() 方法，编写函数功能

public class MyUDF extends UDF {
    //重写evaluate方法
    public Text evaluate(Text s){
        //把字符串转换成小写
        return new Text(s.toString().toLowerCase());
    }
}

打 jar 包，并上传到 Linux
添加 jar 包到 Hive

add jar /opt//hiveUDF.jar;

创建临时自定义函数

临时函数仅当前会话有效，退出后自动删除
不使用 temporary 关键字则创建永久 UDF 函数
永久 UDF 仅在创建函数时所在的库有用

create temporary function hiveudf as "cn.kgc.udf.MyUDF";

使用临时 UDF 函数

select hiveudf("hello WORD");

结果如下：

+-------------+--+
|     _c0     |
+-------------+--+
| hello word  |
+-------------+--+

4、窗口函数

常用窗口函数请查看： Hive窗口函数.

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。

上一篇：mysql 科学计数法转字符串 mysql科学计数法转换

下一篇：MySQL重启日志 mysql重做日志作用

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯