熟悉SQL的读者对Group by 和 having应该不陌生。但可能不熟悉 cube, rollup, grouping sets 。本文带你学习并比较它们的差异,内容参考《mastering postgresql 10》.
1. 示例数据
在正式开始之前,我们先创建示例表,并copy一些示例数据。
create table t_oil(
region text,
country text,
year int ,
production int ,
consumption int
);
利用copy命令加载示例数据:
copy t_oil from program 'curl https://www.cybertec-postgresql.com/secret/oil_ext.txt';
执行成功后加载644条记录数据。
2. 分组分析
group by 对每个组把多行聚集成一行并返回。但有时开发报表应用,另外还需要所有记录的聚集结果,如平均值。利用rollup可以实现:
select region,country,avg(production)
from t_oil
group by rollup(region,country)
rollup另外还返回了所有记录的平均值。实际报表应用中需要汇总记录,通过rollup则不再需要运行两个查询实现。
另外需要指出的postgresql 10对这些操作使用hash实现对性能有了很大提升:
explain
select region,avg(production)
from t_oil
group by rollup(region);
返回结果:
MixedAggregate (cost=0.00..17.31 rows=3 width=44)
Hash Key: region
Group Key: ()
-> Seq Scan on t_oil (cost=0.00..12.44 rows=644 width=16)
当然rollup还可以指定多个分组字段,如果需要排序可以使用order by 子句:
select region,country,avg(production)
from t_oil
group by rollup(region,country)
order by region,country;
这个示例另外还返回三个汇总记录行,一个是Middle East
单独汇总,另一个是North America
单独汇总,最后一个是所有记录的汇总。这个结果很容易实现下钻功能。
rollup适合立刻展示返回结果场景。但做报表过程中,可能需要预计算更多数据确保更灵活。cube关键词可能是你想要的。
select region,country,avg(production)
from t_oil
where country in ('USA','Canada','Iran','Oman')
group by cube (region,country)
order by region,country;
返回结果:
region | country | avg |
Middle East | Iran | 3631.6956521739130435 |
Middle East | Oman | 586.4545454545454545 |
Middle East | 2142.9111111111111111 | |
North America | Canada | 2123.2173913043478261 |
North America | USA | 9141.3478260869565217 |
North America | 5632.2826086956521739 | |
Canada | 2123.2173913043478261 | |
Iran | 3631.6956521739130435 | |
Oman | 586.4545454545454545 | |
USA | 9141.3478260869565217 | |
3906.7692307692307692 |
共返回11条结果,其结果相当于:group by region,country + group by region + group by country + 所有记录平均值。因此一次性聚集不同级别分组值,其结果包括了所有可能的分组。
rollup 和 cube 是在grouping sets子句之上便捷特性,使用grouping sets可以显示列出你想要的分组聚集:
select region,country,round( avg(production) ,2) avg
from t_oil
where country in ('USA','Canada','Iran','Oman')
group by grouping sets ((),region,country)
order by region,country;
这里需要三个分组:所有记录,group by region ,group by country.如果你需要对region,country分组,可以增加(region,country)
select region,country,round( avg(production) ,2) avg
from t_oil
where country in ('USA','Canada','Iran','Oman')
group by grouping sets ((),region,country,(region,country))
order by region,country;
返回结果和上面cube (region,country)
示例结果一样。
另外还可以使用grouping函数返回具体启用那个分组:
select grouping (region), grouping (country),region,country,round( avg(production) ,2) avg
from t_oil
where country in ('USA','Canada','Iran','Oman')
group by grouping sets ((),region,country,(region,country))
order by region,country;
grouping (region) 当对region分组时返回1,否则返回0。另外还可以在having中使用grouping函数进行过滤结果:
select grouping (region), grouping (country),region,country,round( avg(production) ,2) avg
from t_oil
where country in ('USA','Canada','Iran','Oman')
group by grouping sets ((),region,country,(region,country))
having grouping(region) = 0
order by region,country;
我们还可以使用filter子句合并聚集结果;
select region,round( avg(production) ,2) avg,
round( avg(production) filter (where year < 1990),2) as old,
round( avg(production) filter (where year >= 1990),2) as new
from t_oil
where country in ('USA','Canada','Iran','Oman')
group by rollup (region)
order by region;
其中 avg = old + new ,通过filter可以展示更多信息。
3. 总结
本文介绍了分组分析函数,rollup、cube、grouping sets,其中grouping sets可以显示指定任意分组组合,同时还支持使用grouping对分组进行标识、过滤。