hive 查询

原创

塞上江南o 2022-12-28 15:23:41 博主文章分类：Hive ©著作权

文章标签 hive 子查询字段 文章分类 Hadoop 大数据

©著作权归作者所有：来自51CTO博客作者塞上江南o的原创作品，请联系作者获取转载授权，否则将追究法律责任

hive outline

链接

hive SQL查询执行顺序

from>on>join>where>group by>聚合函数（例如分组后统计数量）>having>select>distinct>order>limit

聚合语句(sum,min,max,avg,count)

hive 聚合函数处理null的注意事项

假设有以下一张表：test_null

c1	c2
aaa	3
bbb	3
ccc	null
ddd	6

select count(*), -- ans:4 统计所有行
       count(1), -- ans:4 count(*)==count(1) 统计所有行
       count(c2) -- ans:3 统计所有行，但不包含该字段下为null的行
from test_null


select sum(c2), -- ans:12 运算时不包括null所在行
       max(c2), -- ans:6 运算时不包括null所在行
       min(c2), -- ans:3 运算时不包括null所在行
       avg(c2)  -- ans:12/3 运算时不包括null所在行
from test_null

总结：

count(*) 包含 null 值，统计所有行数
count(id) 不包含 null 所在行
min&max&avg&sum 运算时不包括null所在行

hive count

hive count(*)&count(1)&count(字段名称)的区别

简单来说：

count(*)==count(1)

count(1)：统计所有行，包含null
count(字段名称)：统计所有行，但不包含该字段下为null的行

hive distinct

返回所有匹配的行去除重复的行

select distinct age from emp

整体去重

选美：筛选出每个部门身高不重复的美女

select distinct height,depton from dept;

hive where

hive where的使用条件

where 条件中不能使用聚合函数
原因：使用聚合函数的前提是结果集已经确定，where子句还处于在“确定”结果集的过程中
聚合函数有 count sum avg 等

hive left join 时候 where 和 on 区别

on条件是在生成临时表时使用的条件，它不管on中的条件是否为真，都会返回左表记录
where条件是在临时表生成好后，再对临时表进行过滤的条件

hive group by

hive group by的使用条件

select 后字段只能是 group by 分组的字段，或者是被聚合函数应用的字段

hive 查询_字段

hive group by 和 partition by的区别

group by 汇总后行数减少，partition by汇总后原表中的行数没变

hive having

关键字having作用

如果要对分组后的各组数据要进一步筛选，那么可以使用关键子having

where 和 having 区别

having 是在 group by 分完组之后对数据再次进行筛选的，所以 having 要筛选的字段只能是分组字段或者聚合函数
where 是直接对数据表中的字段直接进行的筛选的

hive Union&UNION ALL 联合查询

UNION

--对两表并行查询，删除重复行，同时进行排序
select num,name from student_local
UNION
select num,name from student_hdfs;

UNION ALL

--对两表并行查询，保留重复行，进行排序
select num,name from student_local
UNION ALL
select num,name from student_hdfs;

如果要将order by,sort by,cluster by,distribute by或limt应用于单个SELECT，请将子查询放在括号内

SELECT sno,sname FROM (select sno,sname from student_local LIMIT 2) subq1
UNION
SELECT sno,sname FROM (select sno,sname from student_hdfs LIMIT 3) subq2

如果要将order by,sort by,cluster by,distribute by或limt应用于整个UNION结果，请将其放在最后

select sno,sname from student_local
UNION
select sno,sname from student_hdfs
order by sno desc;

hive from子句中子查询

在Hive0.12版本，仅在FROM子句中支持子查询，而且必须要给子查询一个名称，因为FROM子句中的每个表都必须有一个名称

select s_id
from (select s_id, s_name from student) tmp;

hive where子句中子查询

在Hive0.13版本后就支持了where子句中子查询，前提是in、not in这样的子查询

select student1.*
from student1
where s_id in (select s_id from student2 );  -- 子查询只能选择一个列

上述查询又称之为：不相关子查询

同样支持 EXISTS 和 NOT EXISTS 类型的子查询

SELECT A
FROM T1
WHERE EXISTS (SELECT B FROM T2 WHERE T1.X = T2.Y) --子查询的WHERE子句中对引用了父查询

上述查询又称之为：相关子查询

hive CTE

以下的图片中的文字说明，用词可能不准确，但我这里想表达的是，梳理语句脉络

from风格（查询）

hive 查询_hive_02

hive 查询_字段_03

chaining CTEs 链式（查询）

hive 查询_字段_04

insert+ cte 创建表

hive 查询_子查询_05

create as+ cte 创建表

hive 查询_字段_06

上一篇：1.6-5求解旋转词问题

下一篇：1.6-1两种排序方法

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯

hive 查询

hive 查询

目录

hive outline

hive SQL查询执行顺序

hive 聚合函数处理null的注意事项

hive count

hive count(*)&count(1)&count(字段名称)的区别

hive distinct

hive where

hive where的使用条件

hive left join 时候 where 和 on 区别

hive group by

hive group by的使用条件

hive group by 和 partition by的区别

hive having

关键字having作用

where 和 having 区别

hive Union&UNION ALL 联合查询

hive from子句中子查询

hive where子句中子查询

hive CTE

from风格（查询）

chaining CTEs 链式（查询）

insert+ cte 创建表

create as+ cte 创建表

51CTO博客