Hive学习路线-Hive的DQL

原创

mb65094bd81c185 2023-11-26 12:36:25 ©著作权

文章标签 字段 SQL mapreduce 文章分类 Redis 数据库

©著作权归作者所有：来自51CTO博客作者mb65094bd81c185的原创作品，请联系作者获取转载授权，否则将追究法律责任

七. Hive的DQL

1. 语法

select [distinct|all] expr，expr... ... from tbName where whereCondition group by colList having havingCondition order by colList cluster by colList limit num

2. where子句

is null is not null between...and... in(v1,v2,v3,... ...) | not in( v1,v2,v3,... ...) and or like | rlike % 通配符 _ 占位符

3. group by 子句

概述：

按照某些字段的值进行分组，有相同值的放置到一起

通常和聚合函数一起使用

SQL案例：

select c1 ,c2

where condition ------>map端执行

group by c1,c2 ------>reduce端执行

having condition ------>reduce端执行

3.1 案例：

1. 建表 create table order_tb(oid int ,num int ,pname string ,uid int) row format delimited fields terminated by ','; 导入数据： load data local inpath '/home/zhangsan/order.txt' into table order_tb; create table user_tb(uid int ,name string ,addr string) row format delimited fields terminated by ','; 导入数据： load data local inpath '/home/zhangsan/user.txt' into table user_tb; 2. 注意 select 后面非聚合列，必须要出现在group by 中可以使用as对表和列设置别名 having和where的区别：位置，condition使用的过滤字段，针对的数据（单位：一条数据；单位：一组数据） 3. 按照用户id分组查询订单表 select o.uid from order_tb as o group by o.uid 使用分组函数|聚合函数：查询用户购买商品数量 select o.uid , count(o.num) from order_tb as o group by o.uid 使用分组函数|聚合函数：查询用户购买商品数量，显示数量大于2的信息 select o.uid , count(o.num) from order_tb as o group by o.uid having count(o.num) >2; 4. 常见的聚合参数 count() max() min() sum() avg()

4. order by 子句

概述

按照某些字段进行排序

现象：order by 可以使用多列进行排序（全局排序），此操作在reduce阶段进行，只有一个reduce

问题：数据量很大

全局排序：order by

asc|desc

案例

select * from order_tb order by num desc; select * from order_tb order by num ,oid asc ; // 多字段排序

5. sort by 排序

在mapreduce内部的排序，不是全局排序 1. 设置reduce的个数默认 < 配置文件 < 命令行 set key=value 设置属性 set key 获取属性 set mapreduce.job.reduces=3; 设置属性值 set mapreduce.job.reduces ; 查看属性值 2. 执行排序 select * from order_tb sort by num; 3. 查看reduce中各个阶段输出的结果数据（将结果导出到本地查看） insert overwrite local directory '/home/zhangsan/sortbyResult' select * from order_tb sort by num desc; 总结：只对当前的分区进行内容的排序

6. distribute by 分区

mapreduce中的partition操作,进行分区，结果sort by使用 distribute by字句需要写在sort by之前（先分区，在排序）案例：sql转成mr时，内部使用oid字段进行分区，使用num字段进行排序 insert overwrite local directory '/home/zhangsan/distributebyResult' select * from order_tb distribute by oid sort by num desc;

7. cluster by 分区并排序

insert overwrite local directory '/home/zhangsan/clusterbyResult' select * from cluster by num ; 注意：分区和排序的字段需要一致不可以指定排序的方向

8. join 子句（多表连接查询）（子查询）