SQL 难点解决：直观分组

原创

raqsoft 2018-11-29 16:13:12 博主文章分类：集算器 ©著作权

文章标签 分组 group 文章分类 大数据

©著作权归作者所有：来自51CTO博客作者raqsoft的原创作品，请联系作者获取转载授权，否则将追究法律责任

1、对位分组

示例 1：按顺序分别列出使用 Chinese、English、French 作为官方语言的国家数量

MySQL8:

with t(name,ord) as (select 'Chinese',1

union all select 'English',2

union all select 'French',3)

select t.name, count(countrycode) cnt

from t left join world.countrylanguage s on t.name=s.language

where s.isofficial='T'

group by name,ord

order by ord;

注意：表的字符集和数据库会话的字符集要保持一致。

(1) show variables like 'character_set_connection'查看当前会话字符集

(2) show create table world.countrylanguage查看表的字符集

(3) set character_set_connection=[字符集]更新当前会话字符集

集算器SPL:

	A
1	=connect("mysql")
2	=A1.query@x("select * from world.countrylanguage where isofficial='T'")
3	[Chinese,English,French]
4	=A2.align@a(A3,Language)
5	=A4.new(A3(#):name, ~.len():cnt)

A1: 连接数据库

A2: 查询出所有官方语言的记录

A3: 需要列出的语言

A4: 将所有记录按Language对位到A3相应位置

A5: 构造以语言和使用此语言为官方语言的国家数量的序表

SQL 难点解决：直观分组_分组

示例 2：按顺序分别列出使用 Chinese、English、French 及其它语言作为官方语言的国家数量

MySQL8:

with t(name,ord) as (select 'Chinese',1 union all select 'English',2

union all select 'French',3 union all select 'Other', 4),

s(name, cnt) as (

select language, count(countrycode) cnt

from world.countrylanguage s

where s.isofficial='T' and language in ('Chinese','English','French')

group by language

union all

select 'Other', count(distinct countrycode) cnt

from world.countrylanguage s

where isofficial='T' and language not in ('Chinese','English','French')

)

select t.name, s.cnt

from t left join s using (name)

order by t.ord;

集算器SPL:

	A
1	=connect("mysql")
2	=A1.query@x("select * from world.countrylanguage where isofficial='T'")
3	[Chinese,English,French,Other]
4	=A2.align@an(A3.to(3),Language)
5	=A4.new(A3(#):name, if(#<=3,~.len(), ~.icount(CountryCode)):cnt)

A4: 将所有记录按Language对位到A3.to(3)相应位置，并追加一组用于存放不能对位的记录

A5: 第4组计算不同CountryCode的数量

SQL 难点解决：直观分组_group_02

2、枚举分组

示例 1：按顺序列出各类型城市的数量

MySQL8:

with t as (select * from world.city where CountryCode='CHN'),

segment(class,start,end) as (select 'tiny', 0, 200000

union all select 'small', 200000, 1000000

union all select 'medium', 1000000, 2000000

union all select 'big', 2000000, 100000000

)

select class, count(1) cnt

from segment s join t on t.population>=s.start and t.population<s.end

group by class, start

order by start;

集算器SPL:

	A
1	=connect("mysql")
2	=A1.query@x("select * from world.city where CountryCode='CHN'")
3	=${string([20,100,200,10000].(~*10000).("?<"/~))}
4	[tiny,small,medium,big]
5	=A2.enum(A3,Population)
6	=A5.new(A4(#):class, ~.len():cnt)

A3: ${…}宏替换，以大括号内表达式的结果作为新表达式进行计算，结果为序列["?<200000","?<1000000","?<2000000","?<100000000"]

A5: 针对 A2 中每条记录，寻找 A3 中第 1 个成立的条件，并追加到对应的组中

SQL 难点解决：直观分组_group_03

示例 2：列出华东地区大型城市数量、其它地区大型城市数量、非大型城市数量

MySQL8:

with t as (select * from world.city where CountryCode='CHN')

select 'East&Big' class, count(*) cnt

from t

where population>=2000000

and district in ('Shanghai','Jiangshu', 'Shandong','Zhejiang','Anhui','Jiangxi')

union all

select 'Other&Big', count(*)

from t

where population>=2000000

and district not in ('Shanghai','Jiangshu','Shandong','Zhejiang','Anhui','Jiangxi')

union all

select 'Not Big', count(*)

from t

where population<2000000;

集算器SPL:

	A
1	=connect("mysql")
2	=A1.query@x("select * from world.city where CountryCode='CHN'")
3	[Shanghai,Jiangshu, Shandong,Zhejiang,Anhui,Jiangxi]
4	[?(1)>=2000000 && A3.contain(?(2)), ?(1)>=2000000 && !A3.contain(?(2))]
5	[East&Big,Other&Big, Not Big]
6	=A2.enum@n(A4, [Population,District])
7	=A6.new(A5(#):class, A6(#).len():cnt)

A5: enum@n将不满足 A4 中所有条件的记录存放到追加的最后一组中

SQL 难点解决：直观分组_分组_04

示例 3：列出所有地区大型城市数量、华东地区大型城市数量、非大型城市数量

MySQL8:

with t as (select * from world.city where CountryCode='CHN')

select 'Big' class, count(*) cnt

from t

where population>=2000000

union all

select 'East&Big' class, count(*) cnt

from t

where population>=2000000

and district in ('Shanghai','Jiangshu','Shandong','Zhejiang','Anhui','Jiangxi')

union all

select 'Not Big' class, count(*) cnt

from t

where population<2000000;

集算器SPL:

	A
1	=connect("mysql")
2	=A1.query@x("select * from world.city where CountryCode='CHN'")
3	[Shanghai,Jiangshu, Shandong,Zhejiang,Anhui,Jiangxi]
4	[?(1)>=2000000, ?(1)>=2000000 && A3.contain(?(2))]
5	[Big, East&Big, Not Big]
6	=A2.enum@rn(A4, [Population,District])
7	=A6.new(A5(#):class, A6(#).len():cnt)

A6: 若A2中记录满足A4中多个条件时，enum@r会将其追加到对应的每个组中

SQL 难点解决：直观分组_group_05

3、返回值直接作为序号进行定位分组

示例 1: 按顺序列出各类型城市的数量

MySQL8: 参见“枚举分组”中 SQL

集算器SPL:

	A
1	=connect("mysql")
2	=A1.query@x("select * from world.city where CountryCode='CHN'")
3	=[0,20,100,200].(~*10000)
4	[tiny,small,medium,big]
5	=A2.group@n(A3.pseg(Population))
6	=A5.new(A4(#):class, ~.len():cnt)

A5: 先计算 A2.Population 在 A3 中段号，然后根据段号进行定位分组

4、原序保持下的相邻记录分组

示例 1: 列出前 10 届奥运金牌榜 (olympic 表中只有历届成绩前 3 名的信息，且没有奖牌完全相同的情况)

MySQL8:

with t1 as (select *,rank() over(partition by game order by gold*1000000+silver*1000+copper desc) rn from olympic where game<=10)

select game,nation,gold,silver,copper from t1 where rn=1;

集算器SPL:

	A
1	=connect("mysql")
2	=A1.query("select * from olympic where game<=10 order by game, gold1000000+silver1000+copper desc")
3	=A2.group@o1(game)

A3: 按原序分到各组，每组取第 1 条记录组成新序表

SQL 难点解决：直观分组_group_06

示例 2: 求奥运会国家总成绩蝉联第 1 的最长届数

MySQL8:

with t1 as (select *,rank() over(partition by game order by gold*1000000+silver*1000+copper desc) rn from olympic),

t2 as (select game,ifnull(nation<>lag(nation) over(order by game),0)neq from t1 where rn=1),

t3 as (select sum(neq) over(order by game) acc from t2),

t4 as (select count(acc) cnt from t3 group by acc)

select max(cnt) cnt from t4;

t1: 求出成绩排名

t2: 列出历届第1名，并根据nation是否与上届不同置标志neq(不同置1，相同置0)

t3: 累积标志neq到acc，可以保证相邻nation相同的acc相同，不相邻nation的acc不相同

集算器SPL:

	A
1	=connect("mysql")
2	=A1.query("select * from olympic order by game, gold1000000+silver1000+copper desc")
3	=A2.group@o1(game)
4	=A3.group@o(nation)
5	=A4.max(~.len())

A4: 将相邻nation相同的记录按原序分到同组

A5: 求各组长度的最大值即最大届数

SQL 难点解决：直观分组_group_07

示例3：列出奥运会总成绩排名第一最长蝉联时的各届信息

MySQL:

with t1 as (select *,rank() over(partition by game order by gold*1000000+silver*1000+copper desc) rn from olympic),

t2 as (select *,ifnull(nation<>lag(nation) over(order by game),0)neq from t1 where rn=1),

t3 as (select *, sum(neq) over(order by game) acc from t2),

t4 as (select acc,count(acc) cnt from t3 group by acc),

t5 as (select * from t4 where cnt=(select max(cnt) cnt from t4))

select game,nation,gold,silver,copper from t3 join t5 using (acc);

集算器SPL:

	A
1	=connect("mysql")
2	=A1.query("select * from olympic order by game, gold1000000+silver1000+copper desc")
3	=A2.group@o1(game)
4	=A3.group@o(nation)
5	=A4.maxp(~.len())

A5: 求出长度最大组

SQL 难点解决：直观分组_group_08