(图片来源于网络,侵删)
Hive练习题👇
题目如下👇
背景说明:
以下表记录了用户每天的蚂蚁森林低碳生活领取的记录流水
table_name:user_low_carbon
字段 | 描述 |
user_id | 用户ID |
data_dt | 日期 |
low_carbon | 减少碳排放(g) |
建表语句👇
create table user_low_carbon(user_id String,data_dt String,low_carbon int)
row format delimited fields terminated by '\t';
蚂蚁森林植物换购表,用于记录申领环保植物所需要减少的碳排放量
table_name: plant_carbon
字段 | 描述 |
plant_id | 植物编号 |
plant_name | 植物名 |
low_carbon | 换购植物所需要的碳 |
建表语句👇
create table plant_carbon(plant_id string,plant_name String,low_carbon int)
row format delimited fields terminated by '\t';
题目1.蚂蚁森林植物申领统计
问题:假设2017年1月1日开始记录低碳数据(user_low_carbon),假设2017年10月1日之前满足申领条件的用户都申领了一颗p004-胡杨,
剩余的能量全部用来领取“p002-沙柳” 。
统计在10月1日累计申领“p002-沙柳” 排名前10的用户信息;以及他比后一名多领了几颗沙柳
得到的统计结果如下表样式:
user_id plant_count less_count(比后一名多领了几颗沙柳)
u_101 20 5
u_088 15 1
u_103 14 ...
题目2、蚂蚁森林低碳用户排名分析
问题:查询user_low_carbon表中每日流水记录,条件为:
用户在2017年,连续三天(或以上)的天数里,
每天减少碳排放(low_carbon)都超过100g的用户低碳流水。
需要查询返回满足以上条件的user_id
例如用户u_002符合条件的记录如下,因为2017/1/2~2017/1/5连续四天的碳排放量之和都大于等于100g:
user_id
u_002
u_005
u_008
u_009
u_010
u_011
u_013
u_014
备注:统计方法不限于sql、procedure、python,java等
数据如下👇
plant_carbon表数据👇
p001 梭梭树 17
p002 沙柳 19
p003 樟子树 146
p004 胡杨 215
user_low_carbon表数据👇
u_001 2017/1/1 10
u_001 2017/1/2 150
u_001 2017/1/2 110
u_001 2017/1/2 10
u_001 2017/1/4 50
u_001 2017/1/4 10
u_001 2017/1/6 45
u_001 2017/1/6 90
u_002 2017/1/1 10
u_002 2017/1/2 150
u_002 2017/1/2 70
u_002 2017/1/3 30
u_002 2017/1/3 80
u_002 2017/1/4 150
u_002 2017/1/5 101
u_002 2017/1/6 68
u_003 2017/1/1 20
u_003 2017/1/2 10
u_003 2017/1/2 150
u_003 2017/1/3 160
u_003 2017/1/4 20
u_003 2017/1/5 120
u_003 2017/1/6 20
u_003 2017/1/7 10
u_003 2017/1/7 110
u_004 2017/1/1 110
u_004 2017/1/2 20
u_004 2017/1/2 50
u_004 2017/1/3 120
u_004 2017/1/4 30
u_004 2017/1/5 60
u_004 2017/1/6 120
u_004 2017/1/7 10
u_004 2017/1/7 120
u_005 2017/1/1 80
u_005 2017/1/2 50
u_005 2017/1/2 80
u_005 2017/1/3 180
u_005 2017/1/4 180
u_005 2017/1/4 10
u_005 2017/1/5 80
u_005 2017/1/6 280
u_005 2017/1/7 80
u_005 2017/1/7 80
u_006 2017/1/1 40
u_006 2017/1/2 40
u_006 2017/1/2 140
u_006 2017/1/3 210
u_006 2017/1/3 10
u_006 2017/1/4 40
u_006 2017/1/5 40
u_006 2017/1/6 20
u_006 2017/1/7 50
u_006 2017/1/7 240
u_007 2017/1/1 130
u_007 2017/1/2 30
u_007 2017/1/2 330
u_007 2017/1/3 30
u_007 2017/1/4 530
u_007 2017/1/5 30
u_007 2017/1/6 230
u_007 2017/1/7 130
u_007 2017/1/7 30
u_008 2017/1/1 160
u_008 2017/1/2 60
u_008 2017/1/2 60
u_008 2017/1/3 60
u_008 2017/1/4 260
u_008 2017/1/5 360
u_008 2017/1/6 160
u_008 2017/1/7 60
u_008 2017/1/7 60
u_009 2017/1/1 70
u_009 2017/1/2 70
u_009 2017/1/2 70
u_009 2017/1/3 170
u_009 2017/1/4 270
u_009 2017/1/5 70
u_009 2017/1/6 70
u_009 2017/1/7 70
u_009 2017/1/7 70
u_010 2017/1/1 90
u_010 2017/1/2 90
u_010 2017/1/2 90
u_010 2017/1/3 90
u_010 2017/1/4 90
u_010 2017/1/4 80
u_010 2017/1/5 90
u_010 2017/1/5 90
u_010 2017/1/6 190
u_010 2017/1/7 90
u_010 2017/1/7 90
u_011 2017/1/1 110
u_011 2017/1/2 100
u_011 2017/1/2 100
u_011 2017/1/3 120
u_011 2017/1/4 100
u_011 2017/1/5 100
u_011 2017/1/6 100
u_011 2017/1/7 130
u_011 2017/1/7 100
u_012 2017/1/1 10
u_012 2017/1/2 120
u_012 2017/1/2 10
u_012 2017/1/3 10
u_012 2017/1/4 50
u_012 2017/1/5 10
u_012 2017/1/6 20
u_012 2017/1/7 10
u_012 2017/1/7 10
u_013 2017/1/1 50
u_013 2017/1/2 150
u_013 2017/1/2 50
u_013 2017/1/3 150
u_013 2017/1/4 550
u_013 2017/1/5 350
u_013 2017/1/6 50
u_013 2017/1/7 20
u_013 2017/1/7 60
u_014 2017/1/1 220
u_014 2017/1/2 120
u_014 2017/1/2 20
u_014 2017/1/3 20
u_014 2017/1/4 20
u_014 2017/1/5 250
u_014 2017/1/6 120
u_014 2017/1/7 270
u_014 2017/1/7 20
u_015 2017/1/1 10
u_015 2017/1/2 20
u_015 2017/1/2 10
u_015 2017/1/3 10
u_015 2017/1/4 20
u_015 2017/1/5 70
u_015 2017/1/6 10
u_015 2017/1/7 80
u_015 2017/1/7 60
参考答案如下👇
题目1
(前提:前十名不考虑棵树一样,只要10条数据)
步骤1:求出2017年10月1日之前每个人的low_carbon总量,并获取low_carbon总量前11名
select
user_id,
sum(low_carbon) sum_low_carbon
from
user_low_carbon
where
date_format(regexp_replace(data_dt,"/","-"),"yyyy-MM")<"2017-10"
group by user_id
order by sum_low_carbon desc
limit 11;t1
步骤2:求出申领一颗胡杨所需要的low_carbon
select
low_carbon
from
plant_carbon
where plant_name='胡杨';t2
步骤3:求出申领一颗沙柳所需要的low_carbon
select
low_carbon
from
plant_carbon
where plant_name='沙柳';t3
步骤4:求出换取一颗胡杨之后剩余可换取的沙柳个数(取整)
select
user_id,
floor((t1.sum_low_carbon-t2.low_carbon)/t3.low_carbon) low_carbon_count
from
(select
user_id,
sum(low_carbon) sum_low_carbon
from
user_low_carbon
where
date_format(regexp_replace(data_dt,"/","-"),"yyyy-MM")<"2017-10"
group by user_id
order by sum_low_carbon desc
limit 11)t1,
(select
low_carbon
from
plant_carbon
where plant_name='胡杨')t2,
(select
low_carbon
from
plant_carbon
where plant_name='沙柳')t3;t4
步骤5:求出用户后一名的沙柳个数
select
user_id,
low_carbon_count,
lead(low_carbon_count,1) over(order by low_carbon_count desc) next_low_carbon_count
from
(select
user_id,
floor((t1.sum_low_carbon-t2.low_carbon)/t3.low_carbon) low_carbon_count
from
(select
user_id,
sum(low_carbon) sum_low_carbon
from
user_low_carbon
where
date_format(regexp_replace(data_dt,"/","-"),"yyyy-MM")<"2017-10"
group by user_id
order by sum_low_carbon desc
limit 11)t1,
(select
low_carbon
from
plant_carbon
where plant_name='胡杨')t2,
(select
low_carbon
from
plant_carbon
where plant_name='沙柳')t3)t4;
步骤6:求出用户比后一名多领了几颗沙柳
select
user_id,
low_carbon_count,
(low_carbon_count-next_low_carbon_count) low_carbon_diff
from
(select
user_id,
low_carbon_count,
lead(low_carbon_count,1) over(order by low_carbon_count desc) next_low_carbon_count
from
(select
user_id,
floor((t1.sum_low_carbon-t2.low_carbon)/t3.low_carbon) low_carbon_count
from
(select
user_id,
sum(low_carbon) sum_low_carbon
from
user_low_carbon
where
date_format(regexp_replace(data_dt,"/","-"),"yyyy-MM")<"2017-10"
group by user_id
order by sum_low_carbon desc
limit 11)t1,
(select
low_carbon
from
plant_carbon
where plant_name='胡杨')t2,
(select
low_carbon
from
plant_carbon
where plant_name='沙柳')t3)t4)t5
limit 10;
题目2
步骤1:查询在2017年low_carbon超过100g的用户低碳流水的日期
select
user_id,
date_format(regexp_replace(data_dt,"/","-"),"yyyy-MM-dd") date
from
user_low_carbon
where substring(data_dt,1,4)='2017'
group by user_id,data_dt
having sum(low_carbon)>100;t1
步骤2:针对所有用户设置日期的rank排名(使用数学上的等差进行解决问题)
select
user_id,
date,
rank() over(partition by user_id order by date) rk
from
(select
user_id,
date_format(regexp_replace(data_dt,"/","-"),"yyyy-MM-dd") date
from
user_low_carbon
where substring(data_dt,1,4)='2017'
group by user_id,data_dt
having sum(low_carbon)>100)t1;t2
步骤3:日期减去排名
select
user_id,
date,
date_sub(date,rk) datediff
from
(select
user_id,
date,
rank() over(partition by user_id order by date) rk
from
(select
user_id,
date_format(regexp_replace(data_dt,"/","-"),"yyyy-MM-dd") date
from
user_low_carbon
where substring(data_dt,1,4)='2017'
group by user_id,data_dt
having sum(low_carbon)>100)t1)t2;t3
步骤4:根据用户和datediff进行分组,判断个数是否大于3
select
user_id
from
(select
user_id,
date,
date_sub(date,rk) datediff
from
(select
user_id,
date,
rank() over(partition by user_id order by date) rk
from
(select
user_id,
date_format(regexp_replace(data_dt,"/","-"),"yyyy-MM-dd") date
from
user_low_carbon
where substring(data_dt,1,4)='2017'
group by user_id,data_dt
having sum(low_carbon)>100)t1)t2 )t3
group by user_id,datediff
having count(*) >= 3
都看到这里了,点赞评论一下吧!!!