前言:

从朋友那里拿到了一个他面试某公司的sql笔试题,感觉还是挺有意思的,就放在这里供大家讨论看看,我自己写了一版,但是不知道是否正确,会不会有没考虑到的情况漏掉了导致不能取到数,同时也希望大家提供更好的解题思路。

题目:
有一场篮球赛,参赛双方是A队和B队,场边记录员记录下了每次得分的详细信息:
team:队名
number:球衣号,
name:球员姓名,
score_time:得分时间,
score:当次得分

问(用sql表达):
1)输出每一次的比分的反超时刻,以及对应的完成反超的球员姓名
2)输出连续三次或以上得分的球员姓名,以及那一拨连续得分的数值

首先,我按照题目生成了38条随机数据,插入了hive数据库中,如下图所示:

--见表语句
create table test.basketball_game_score_detail(
team  string,
number int,
score_time  string,
score  int,
NAME string
);
----插入语句,如下所示
insert into table test.basketball_game_score_detail values('A',1,'2020/8/28 9:01:14',1,'A1');
insert into table test.basketball_game_score_detail values('A',5,'2020/8/28 9:02:28',1,'A5');
insert into table test.basketball_game_score_detail values('B',4,'2020/8/28 9:03:42',3,'B4');
insert into table test.basketball_game_score_detail values('A',4,'2020/8/28 9:04:55',3,'A4');
insert into table test.basketball_game_score_detail values('B',1,'2020/8/28 9:06:09',3,'B1');
insert into table test.basketball_game_score_detail values('A',3,'2020/8/28 9:07:23',3,'A3');
insert into table test.basketball_game_score_detail values('A',4,'2020/8/28 9:08:37',3,'A4');
insert into table test.basketball_game_score_detail values('B',1,'2020/8/28 9:09:51',2,'B1');
insert into table test.basketball_game_score_detail values('B',2,'2020/8/28 9:11:05',2,'B2');
insert into table test.basketball_game_score_detail values('B',4,'2020/8/28 9:12:18',1,'B4');
insert into table test.basketball_game_score_detail values('A',1,'2020/8/28 9:13:32',2,'A1');
insert into table test.basketball_game_score_detail values('A',1,'2020/8/28 9:14:46',1,'A1');
insert into table test.basketball_game_score_detail values('A',4,'2020/8/28 9:16:00',1,'A4');
insert into table test.basketball_game_score_detail values('B',3,'2020/8/28 9:17:14',3,'B3');
insert into table test.basketball_game_score_detail values('B',2,'2020/8/28 9:18:28',3,'B2');
insert into table test.basketball_game_score_detail values('A',2,'2020/8/28 9:19:42',3,'A2');
insert into table test.basketball_game_score_detail values('A',1,'2020/8/28 9:20:55',1,'A1');
insert into table test.basketball_game_score_detail values('B',3,'2020/8/28 9:22:09',2,'B3');
insert into table test.basketball_game_score_detail values('B',3,'2020/8/28 9:23:23',3,'B3');
insert into table test.basketball_game_score_detail values('A',5,'2020/8/28 9:24:37',2,'A5');
insert into table test.basketball_game_score_detail values('B',1,'2020/8/28 9:25:51',3,'B1');
insert into table test.basketball_game_score_detail values('B',2,'2020/8/28 9:27:05',1,'B2');
insert into table test.basketball_game_score_detail values('A',3,'2020/8/28 9:28:18',1,'A3');
insert into table test.basketball_game_score_detail values('B',4,'2020/8/28 9:29:32',1,'B4');
insert into table test.basketball_game_score_detail values('A',1,'2020/8/28 9:30:46',3,'A1');
insert into table test.basketball_game_score_detail values('B',1,'2020/8/28 9:32:00',1,'B1');
insert into table test.basketball_game_score_detail values('A',4,'2020/8/28 9:33:14',2,'A4');
insert into table test.basketball_game_score_detail values('B',1,'2020/8/28 9:34:28',1,'B1');
insert into table test.basketball_game_score_detail values('B',5,'2020/8/28 9:35:42',2,'B5');
insert into table test.basketball_game_score_detail values('A',1,'2020/8/28 9:36:55',1,'A1');
insert into table test.basketball_game_score_detail values('B',1,'2020/8/28 9:38:09',3,'B1');
insert into table test.basketball_game_score_detail values('A',1,'2020/8/28 9:39:23',3,'A1');
insert into table test.basketball_game_score_detail values('B',2,'2020/8/28 9:40:37',3,'B2');
insert into table test.basketball_game_score_detail values('A',3,'2020/8/28 9:41:51',3,'A3');
insert into table test.basketball_game_score_detail values('A',1,'2020/8/28 9:43:05',2,'A1');
insert into table test.basketball_game_score_detail values('B',3,'2020/8/28 9:44:18',3,'B3');
insert into table test.basketball_game_score_detail values('A',5,'2020/8/28 9:45:32',2,'A5');
insert into table test.basketball_game_score_detail values('B',5,'2020/8/28 9:46:46',3,'B5');

数据分析常见sql题 数据分析sql笔试题_笔试题

1.输出每一次的比分的反超时刻,以及对应的完成反超的球员姓名

-- 首先,计算出每次得分时间是,两支队伍分别的比分是多少,进行比较
select team,number,name,score_time
,sum(A_score)over(order by score_time) a_sum_score2 --计算每个时点A队的累计得分
,sum(b_score)over(order by score_time) b_sum_score2 --计算每个时点B队的累计得分
from (
SELECT TEAM,number,name,score_time
,coalesce(case when team='A' then score end,0) as A_score --如果某个得分时点B队得分了,A队没有得分,那么A对在这个时点的得分置为0
,coalesce(case when team='B' then score end,0) as B_score --如果某个得分时点A队得分了,B队没有得分,那么B对在这个时点的得分置为0
FROM test.basketball_game_score_detail
ORDER BY SCORE_time
) x
/*结果如下:
team	number	name	score_time	a_score	b_score	a_sum_score2	b_sum_score2
A	1	A1	2020/8/28 9:01:14	1	0	1	0
A	5	A5	2020/8/28 9:02:28	1	0	2	0
B	4	B4	2020/8/28 9:03:42	0	3	2	3
A	4	A4	2020/8/28 9:04:55	3	0	5	3
B	1	B1	2020/8/28 9:06:09	0	3	5	6
A	3	A3	2020/8/28 9:07:23	3	0	8	6
A	4	A4	2020/8/28 9:08:37	3	0	11	6
B	1	B1	2020/8/28 9:09:51	0	2	11	8
B	2	B2	2020/8/28 9:11:05	0	2	11	10
B	4	B4	2020/8/28 9:12:18	0	1	11	11
*/
---然后,如上所示,计算出了所得结果,但是我们要取出反超的时刻,反超我理解的是,在上一局A的得分>=B,到下一局就变成A<B,这种情况,反之亦然;
--因此,我算出了每个得分时刻两队的累计分差score_gap以及last_score_gap(按照时间,后置一格,即上一局的得分差),两个字段相乘应该是<=0;
--且要排除累计得分相同的情况,相乘为0 情况
select *,score_gap*last_score_gap
from (select  *,a_sum_score2-b_sum_score2 as score_gap
,lag(a_sum_score2-b_sum_score2)over(order by score_time) as last_score_gap
from (select team,number,name,score_time,A_score,b_score
,sum(A_score)over(order by score_time) a_sum_score2
,sum(b_score)over(order by score_time) b_sum_score2
from (
SELECT TEAM,number,name,score_time,coalesce(case when team='A' then score end,0) as A_score
,coalesce(case when team='B' then score end,0) as B_score
FROM test.basketball_game_score_detail
ORDER BY SCORE_time
) x
)y
) z
where z.score_gap*last_score_gap<=0
and a_sum_score2<>b_sum_score2 --排除得分相等的时点,这些时点肯定不考虑
;
---所有反超时刻,一共8次---
team	number	name	score_time	a_score	b_score	a_sum_score2	b_sum_score2	score_gap	last_score_gap	_c1
B	4	B4	2020/8/28 9:03:42	0	3	2	3	-1	2	-2
A	4	A4	2020/8/28 9:04:55	3	0	5	3	2	-1	-2
B	1	B1	2020/8/28 9:06:09	0	3	5	6	-1	2	-2
A	3	A3	2020/8/28 9:07:23	3	0	8	6	2	-1	-2
A	1	A1	2020/8/28 9:13:32	2	0	13	11	2	0	0
B	2	B2	2020/8/28 9:18:28	0	3	15	17	-2	1	-2
A	2	A2	2020/8/28 9:19:42	3	0	18	17	1	-2	-2
B	3	B3	2020/8/28 9:23:23	0	3	19	22	-3	0	0

2.输出连续三次或以上得分的球员姓名,以及那一拨连续得分的数值
我的想法是首先拉出得了三次以及上的球员名单,再去研究他们每一次比赛得分是否是连续的;

select x2.number,x2.name,score_time_rank
,row_number()over(partition by x2.team,x2.number order by score_time) as rank --按照每个球员对其每场比赛按照时间顺序进行排序
,x2.score_time_rank-row_number()over(partition by x2.team,x2.number order by score_time) as new_rank
from (
SELECT TEAM,number,name,count(1) as cnt
FROM test.basketball_game_score_detail
group by TEAM,number,name
having count(1)>=3 -- 先把有得过3次及以上的对员及其记录拉出来
)x1
left outer join (
select *,row_number()over(order by score_time) as score_time_rank  --按时间对每一局排序
from test.basketball_game_score_detail
) x2 on x1.team=x2.team and x1.number=x2.number
order by x2.number,x2.name,score_time_rank
;
number	name	score_time_rank	rank	new_rank
1	A1	1	1	0
1	A1	11	2	9 --球员A1的这两条比赛就是连续的,如果是连续的那么score_time_rank-rank的值应该是相等的,因此要求三次连续以上,也就是将每个球员具有多次new_rank的对应记录拉出来;
1	A1	12	3	9
1	A1	17	4	13
1	A1	25	5	20
1	A1	30	6	24
1	A1	32	7	25
1	A1	35	8	27
1	B1	5	1	4
1	B1	8	2	6
1	B1	21	3	18

根据上述方式,发现好像没有必要先将得分超过3次及以上用户的名单拉出来,完全可以在最后通过count(new_rank)>=3来排除,因此做了一下修改,最后结果如下所示:

create table test.tmp_basketball_game_user_detail as
select team,number,name,new_rank,count(1) as num
from (
select x2.team,x2.number,x2.name,score_time_rank
,row_number()over(partition by x2.team,x2.number order by score_time) as rank
,x2.score_time_rank-row_number()over(partition by x2.team,x2.number order by score_time) as new_rank
from  (
select *,row_number()over(order by score_time) as score_time_rank  --按时间对每一局排序
from test.basketball_game_score_detail
) x2 
order by x2.number,x2.name,score_time_rank
) y
group by team,number,name,new_rank
having  count(1)>=3

至此可以拉出在3次及以上球员的名单,具体的得分可以再次关联一下原表,有兴趣的可以自己尝试一下;
因为我在生成样本数据的时候没有生成3次以上的球员,因此最后的结果是空的,有兴趣可以自己尝试改两条数据的球员name;