hive 返回对应周 hive函数大全过往记忆

转载

mob6454cc6c8549 2023-07-12 23:57:48

文章标签 hive 返回对应周 hive unix json 字段 文章分类 Hive 大数据

Hive常用函数总结一

1. select ..A.. from ..B.. where ..c..

查看列名字段desc
选出城市在北京，性别为女性的10名用户
查看交易表名desc

2. group by

2019年2月到5月，每个品类有多少人购买，累计金额是多少
group by .. having..

3. order by...

2019年5月，支付金额最多的top 5 用户

5. from_unixtime unix_timestamp

把时间戳转换为日期

6. data_diff data_add data_sub

用户首次激活时间，与2019年5月1日的日期间隔

7. 条件函数

case when
if函数

8. 字符串函数

每个月新激活的用户数：
不同手机品牌的用户数get_json_object：

9. 聚合统计函数

ELLA用户2018年平均支付金额，以及2018年最大支付日期和最小支付日期的时间间隔
用户激活时间在2018年，年龄段在20-30和30-40的婚姻状况分布

10. 执行顺序

1. select …A… from …B… where …c…

A:列名 B：表名 C：筛选条件

查看列名字段desc

hive 返回对应周 hive函数大全过往记忆_hive 返回对应周

选出城市在北京，性别为女性的10名用户

select user_name
from user_info
where city='beijing' and sex='female'
limit 10;

hive 返回对应周 hive函数大全过往记忆_json_02

查看交易表名desc

hive 返回对应周 hive函数大全过往记忆_json_03

对分区表，where 条件中必须对分区字段进行限制：

常见错误，未对分区进行限制示例：

select user_name, piece, pay_amount
from user_trade
where dt='2019-05-01' and good_category='food'

hive 返回对应周 hive函数大全过往记忆_hive_04

2. group by

2019年2月到5月，每个品类有多少人购买，累计金额是多少

select goods_category,
count(distinct user_name)  as user_num,
sum(pay_amount) as total_amount
from user_trade
where dt between '2019-02-1' and '2019-05-01'
group by goods_category;

hive 返回对应周 hive函数大全过往记忆_字段_05

group by … having…

2019年4月，支付金额超过5万元的用户

select user_name, sum(pay_amount) as total_amount
from user_trade
where dt between 2019-04-01'  and '2019-0430'
group by user_name having sum(pay_amount) >50000;

hive 返回对应周 hive函数大全过往记忆_unix_06

3. order by…

2019年5月，支付金额最多的top 5 用户

select user_name,sum(pay_amount) as total_amount
from user_trade
where dt between '2019-05-01' and '2019-05-30'
group by user_name
order by total_amount desc 
limit 5;

ASC :升序（默认） DESC：降序

对多个字段进行排列时： order by A ASC, B DESC

hive 返回对应周 hive函数大全过往记忆_unix_07

5. from_unixtime unix_timestamp

把时间戳转换为日期

select  pay_time,from_unixtime(pay_time,'yyyy-MM-dd hh:m:ss')
from user_trade
where dt= '2019-04-09';

hive 返回对应周 hive函数大全过往记忆_json_08

from_unixtime(bigint unixtime, string format)

format:
yyyy-MM-dd hh:mm:ss
yyyy-MM-dd hh
yyyy-MM-dd hh:mm
yyyyMMdd
把日期转换为时间戳：
unix_timestamp(string date)

6. data_diff data_add data_sub

用户首次激活时间，与2019年5月1日的日期间隔

select  user_name, 
	data_diff('2019-05-01',todate(firstactivetime))
from user_info
limt 10;

hive 返回对应周 hive函数大全过往记忆_字段_09

datadif (string enddate, string startdate)

日期增加减少函数：

data_add(string startdate, int days)

data_sub(string startdate, int days)

7. 条件函数

case when

统计20岁以下，20-30，30-40，40以上的用户数：

select case when age < 20 then '20s岁以下'
					when age>=20 and age<30 then '20-30岁'
					when age>=30 and age <40 then '30-40岁'
					else ‘40岁以上’ end,
					count (distinct user_id) user_num
from user_info
group by case when age<20 then '20岁以下'
						when age>=20 and age<30 then '20-30岁'
						when age>=30 and age<40 then '30-40岁'
						else '40岁以上'  end;

hive 返回对应周 hive函数大全过往记忆_hive_10

if函数

统计每个性别用户等级高低的分布情况：

select sex,
		if(level>5,‘高’，‘低’），
		count(distinct user_id)  user_num
from user_info
group by sex,
				if(level>5,'高'，'低')；

hive 返回对应周 hive函数大全过往记忆_hive_11

8. 字符串函数

每个月新激活的用户数：

select  substr(firstactivetime,1,7) as month
			count(distinct user_id) as user_num
from  user_info
group by substr(firstactivetime,1,7);

hive 返回对应周 hive函数大全过往记忆_hive 返回对应周_12

substr(string A, int start, int len)

不同手机品牌的用户数get_json_object：

select get_json_object(extral,'$.phonebrand') as phone_brand
			count(distinct user_id) user_num
from user_info
group by get_json_object(extral, '$.phonebrand');

hive 返回对应周 hive函数大全过往记忆_hive 返回对应周_13

hive 返回对应周 hive函数大全过往记忆_hive 返回对应周_14

get_json_object(string json_string, string path)

json_string: 需要解析的json 字段

path: 用.key 取出想要获取的value;

9. 聚合统计函数

ELLA用户2018年平均支付金额，以及2018年最大支付日期和最小支付日期的时间间隔

select  avg(pay_amount) as avg_amount,
			datediff(max(from_unixtime(pay_time,"yyyy-MM-DD"),min(from_unixtime(pay_time,"yyyy-MM-dd")))
from user_trade
where year(dt)='2018'  and user_name='ELLA';

聚合函数之间不允许互相嵌套：avg(count(*))

2018年购买商品品类两个以上的用户数：

select count(a.user_name)
from(
	select user_name,
			count(distinct goods_category) as category_num
	from user_trade
	where year(dt)='2018' 
	group by user_name having count(distinct goods_category )>2 )as a;

用户激活时间在2018年，年龄段在20-30和30-40的婚姻状况分布

select a.age_type,
		if(a.marriage_status = 1,"已婚"，"未婚“”),
		count(distinct a.user_id)
from(
		select case when age<20 then "20岁以下"
							when age >=20 and age < 30 then "20-30岁"
							when age>=30 and age < 40 then "30-40岁"
							else "40岁以上" end as age_type,
							get_json_object(extra1, "$.marriage_status") as marriage_status,
							user_id
		from  user_info
		where to_date(firstactivetime) between '2018-01-01' and '2018-012-31'  ) as a; 
where a.age_type in ("20-30岁","30-40岁")
group by a.age_type,
        		if(a.marriage_status = 1,"已婚"，"未婚“”)；

hive 返回对应周 hive函数大全过往记忆_字段_15