hiveSQL取一个日期的年份

转载

mob6454cc71d565 2024-09-04 11:01:36

文章标签 hiveSQL取一个日期的年份 hadoop 字段表名加载 文章分类 Hive 大数据

1、数据加载【必须掌握】（1，2，3，4，5）

从本地文件加载： hadoop fs -put 本地路径 hdfs路径; load data local path 本地路径 into table 表名; load data local path 本地路径 overwrite into table 表名; 从一个表加载到另一个表： insert into table table_name select语句--插入 insert overwrite table table_name select 语句---覆盖 create table table_name like select语句--建表时把另一张表的格式复制过来 1/在实战中,通常都是从MySQL直接导入到ods的表中,通过sqoop 2/背景,老师现在自己建的数据源,在hdfs上创建了文件,将数据源通过编辑的方式,将数据源报错到hdfs上, 然后他直接就可以查到了? --我突然就想起来hive的作用就是将HFDS中的数据和表进行映射,让我疑惑的是我印象中的加载方式好像没有这种. 回头看了一眼,都是将Linux上的文件加载到hdfs上(让hive进行映射),或者直接加载到表中

2、创建分桶排序表（1，2，3，4，5）【重点】

create table table_name( name string, id int, years array<string> ) clustered by name sorted by (id desc) into 4 buckets; row format deliminted collection trims terminated by "|"

3、创建普通表，查询时分桶排序【重点】（1，2，3，4，5）

set.mapreduce.job.reduce=n; cluster by 不能倒序等效于distrbute by + order by (全局排序) 可以倒序

-- 为什么要在查询时分桶排序,不在建表时操作呢? 分桶字段一定是经常查询和关联的字段 -- 有些字段我们不经常连接,偶尔使用一次且想提高连接效率,则可以使用该方法，为什么？分桶排序比全局排序效率更高

4、正则匹配[了解]（1，2，3,4,5）

rlike .任意一个字符 * 任意个任意字符 .... 等效与 .{4}

5、union联合查询【重要】（1，2,3，4，5）

union 是增加行 join 是增加列 union 会默认去重想要不去重 union all 会默认使用哈希值排序，先要按照自己的规则排序需再排序合并后再添加

6、sampling抽样【理解】（1,2,3,4）

tablesample (bucket x out of y on column) x 是从第几个桶开始取，索引从1开始，其它索引从0开始 y 总桶数÷y 就是我们要取的桶数 x 决不能大于y column 是要抽取的字段放在表名后面，如果有别名，放在别名前面 select * from table_nane tablesample(bucket x out of y on column) rand()返回0-1的任意随机浮点数包含0不博涵1

7、虚拟列【了解】（1，2，3）

INPUT_FILE_NAME：显示数据行在文件中的具体位置

Hive的函数

1、区分和Python的函数
2、分为

8、函数的分类（1，2，3）

UDF：一进一出：round（）四舍五入 UDTF生成表函数：explode 炸裂函数 UDAF聚合函数：多进一出 count（）

9、查看函数的使用方法【重要】(1,2,3)

show functions;
desc function extended +;
desc function extended rand;(不能有括号)
 
10、字符串函数(1,)
字符串拼接ws+字符串拆分（2个对比记忆）
 
'传智,有你,会更好'
'我','是','帅哥'
concat（）
concat_ws（'-','我','是','帅哥'）--》连接符是在最前面
我是
spilt ('传智,有你,会更好',',')-->分隔符是在最后面
 
截取字符串的部分信息【必须得掌握】
 
'我爱北京天安门abc'
/*
 我  爱 北  京 天 安 门
 1   2  3  4  5  6  7
 -7 -6 -5 -4 -3 -2 -1
 */
 substr（字符串，起始位置，截取长度）
substr('我爱北京天安门abc',5,3)
substr（'我爱北京天安门abc'，-3,3）
证明两个都是从左往右走

hive和java，Python不同的点

trim select trim(' su sf ') 清除两侧的空白，不能清除中间的， hive不能清除制表符和换行符 java和Python可以清除制表符（\t）和换行符(\n)

11、时间函数【全背】

时间，日期，时间标准格式，

时间转日期不成功的两种情况，

获取指定时间的部分信息（年，季度），获取时间差（3种情况），时间的增加和减少(按天来算)，

【时间转换为时间戳，将时间戳转换为时间类型，时间类型格式成自己想要的形式。_format】明天早上背

总结：不管是转换还是提取，都需要标准的时间格式

select `current_date`();
select `current_timestamp`();  timestamp  tablesample
yyyy-MM-dd HH:mm:ss
'2023-04-06 14:46:47.040000000'
select 格式不对（2023年11月1日）
select 日期残缺（2023-11-）
select year(`current_timestamp`())
select datediff(时间，时间)或者（日期，时间）
select dateadd(日期或者时间 +5/-4)
select
select
select
select
select
select
select

12、数学函数（1，）

获取从1-7的随机整数怎么搞?? ceil(rand()*7) 获取从5-10的随机整数怎么搞?? ceil(rand()*5 +5) 向上取整，向下取整 ceil fioor rand() round()

13、条件函数（极其重要）【后面新零售项目，每个都用到它，先把它刷5次】（1，2，3）

1、if条件函数（hive的）昨天的if条件函数（shell的）
hive:
    if (条件 ，true返回的数据，false返回的数据)
    select name ,if(gender='男','男生','女生') from table_name；
2、空值类型  空值判断，
    null=null返回的还是null，并非布尔类型
    is null
    is notnull
3、空值替换  
    nvl（字段，默认字段）
    create table table_name1 as select if(gender='男',null,'女生') from table_name2；
    select nvl(gender,'男生') from table_name2；
4、获取第一个不为null的数据
    coalesce (1,null,3,null)
    coalesce (`array` (null,1,3))
    可以接收array类型，但是会把单个array看成一个整体
5、case when 的两种用法。[orderid  paytime totalmoney1  paypyte
                  
 
select                            值  返回数据
    orderid,
    paytime,
    case totalmoney1
        when 0 then '现金'(这个位置绝对不能有逗号)
        when 2 then '微信'
        else '未知'
    end
    as totalmoney2，
    paypyte
from table_name；

select                          判断条件  返回数据
    orderid,
    paytime,
    case 
        when totalmoney1 = 0 then ''
        when totalmoney1 = 2 then ''
        else '未知'
    end
    as totalmoney2，
    paypyte
from table_name；

14、数据类型转换

cast（原有数据 as 想要的数据类型） cast('123.4' as int);

15、其它函数（哈希，CRC）

哈希取值（分桶排序，union默认排序） CRC循环冗余码校验手机下载软件时进行包的完整性检测

16、集合函数【理解】

array_concatins--》判断是否在内部 array_contains(array(1,null,4,2,7),7)判断7是否在array内 sort_array(1,3,2,0,4)--排序

17、CET表达式【非常重要】（1，）

with 临时表名1 as 查询集1，临时表名2 as 查询2，临时表名3 as查询集3 查询集3可以使用临时表名1和2 优势：只需要加载一次表到内存当中，不使用CET时，需要读取一次加载一次 with table_name1 as（select * from table_name2）, table_name3 as (select * from table_name1) select name ,id from table_name3;

18、炸裂函数和侧视图【理解】

array + map 炸裂 lateral view视窗
 
select explode （`map`()）
两列N行  炸成K列和V列
select explode (`array`(1,2,3,4,,5,6))一列6行


 
案例 ：
 
1、建表
create table table_name1(
    id int,
    years array<sting>
)
row format delimited 
    collection trims terminated by "|"
    
2、如何炸开？
方案一：
select explode(字段名) as 别名 from table_name1 b(炸成后的表起别名，一定要起)
3、利用侧视图合并表
select name , year from table_name1 lateral view explode(字段) b(表别名)  as字段别名；


 
N、今天用到的函数
 
rand()随机返回一个浮点数  
round()保留几位小数

单词：
extended：扩充的
coalesce: 
trim:
substr:
floor:
ceil:
 
记忆不清
获取第一个不为空的值
 
contains
contains
coalesce
coalesce

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。

上一篇：ob创建索引带排序

下一篇：Android 广播接收器接收多个广播

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯