文章目录
- 1. 多维数据分析
- 2. 测试数据准备
- 3. 实现多维数据分析
- 3.1 with cube
- 3.2 with rollup
- 3.3 grouping sets
版本信息:hive 2.1.1
Grouping__ID function was fixed in Hive 2.3.0, thus behavior before that release is different (this is expected). For each column, the function would return a value of “0” iif that column has been aggregated in that row, otherwise the value is “1”. hive 2.3.0 之后的版本grouping__id 的生成逻辑改变了
1. 多维数据分析
在多维分析中,数据是按照维度(观察数据的角度)来表示的,比如商品、城市、客户。而维通常按层次(层次维度)组织的,如城市、省、国家,再比如时间也是有层次的,如天、周、月、季度和年。不同的管理者可以从不同的维度(视角)去观察这些数据,这些在多个不同维度上对数据进行综合考察的手段就是通常所说的数据仓库多维查询,最常见的就如上卷(roll-up)和下钻(drill-down)了,所谓上卷,指的是选定特定的数据范围之后,对其进行汇总统计以获取更高层次的信息。所谓下钻,指的是选定特定的数据范围之后,需要进一步查看细节的数据。从另一种意义上说,钻取就是针对多维展现的数据,进一步探究其内部的组成和来源。值得注意的是,上卷和下钻要求维度具有层级结构,即数仓中所说的层次维度。1
2. 测试数据准备
-- 测试数据
create table test.test_zhongxg_20220619 as
select 'A' as f1,'A' as f2,'B' as f3,1 as f4
union all
select 'B' as f1,'B' as f2,'A' as f3,1 as f4
union all
select 'A' as f1,'A' as f2,'A' as f3,2 as f4
;
3. 实现多维数据分析
3.1 with cube
sql实例
-- with cube
-- 分组组合最全,是各个维度值的笛卡尔(包含null)组合,这个用法一般可以嵌套一层,通过约束 GROUPING__ID 的值来筛选自己想要的维度组合
select f1,f2,f3
,sum(f4) as sum_f4
,grouping__id as `汇总维度标识`
,cast(grouping__id as bigint) as `类型转换`
,bin(cast(grouping__id as bigint)) as `二进制转换`
,reverse(bin(cast(grouping__id as bigint))) as `字符串翻转`
,rpad(reverse(bin(cast(grouping__id as bigint))),3,'0') as `字符串右补齐`
from test.test_zhongxg_20220619
group by f1,f2,f3
with cube
;
执行结果
3.2 with rollup
sql实例
-- with rollup
-- 各维度组合应满足,前一维度为null后一位维度必须为null,前一维度取非null时,下一维度随意
select f1,f2,f3
,sum(f4) as sum_f4
,grouping__id as `汇总维度标识`
,cast(grouping__id as bigint) as `类型转换`
,bin(cast(grouping__id as bigint)) as `二进制转换`
,reverse(bin(cast(grouping__id as bigint))) as `字符串翻转`
,rpad(reverse(bin(cast(grouping__id as bigint))),3,'0') as `字符串右补齐`
from test.test_zhongxg_20220619
group by f1,f2,f3
with rollup
;
执行结果
3.3 grouping sets
sql实例
-- grouping sets
-- 自定义维度,根据需要分组即可。(通过grouping sets的使用可以简化SQL,比group by单维度进行union性能更好)
select f1,f2,f3
,sum(f4) as sum_f4
,grouping__id as `汇总维度标识`
,cast(grouping__id as bigint) as `类型转换`
,bin(cast(grouping__id as bigint)) as `二进制转换`
,reverse(bin(cast(grouping__id as bigint))) as `字符串翻转`
,rpad(reverse(bin(cast(grouping__id as bigint))),3,'0') as `字符串右补齐`
from test.test_zhongxg_20220619
group by f1,f2,f3
grouping sets((f1),(f1,f2))
;
执行结果
注意:grouping sets使用的维度不能是函数表达式之类的,一般可以先把结果嵌套一层,给函数表达式的维度起别名,再用别名来多维分析!!!