hive sql 中某字段不为空命令 hive建表字段设置不为空

转载

小鱼儿 2023-07-12 21:31:31

文章标签 hive sql 中某字段不为空命令 mysql java hive 数据库 文章分类 Hive 大数据

Hive分区表新增字段，查询时数据为NULL的解决方案

由于业务拓展，需要往hive分区表新增新的字段，hive版本为2点多。

于是利用

alter table table_name add columns (col_name string )

新增字段，然后向已存在分区中插入数据，以为问题就解决了。

结果一查询发现新增字段的值全部为null。

这是怎么回事，怀疑是不是数据没有插入成功，于是查看日志确实是写入成功了，后换了impala和presto 两种引擎查询，发现两个结果都有值，如果直接到目录下查看数据文件会发现确实有值。

经排查，这是hive 的bug，用Hive版本比较低，会出现这个问题。据说最新的版本已经没有这个问题了（未验证）。

1. 问题追溯

为了复现这个问题，今天把这个问题追溯下。

1.新增一张学生测试表并向分区插入数据

create table if not exists test.student
(
id string comment '编号',
user_name string comment '姓名',
age int comment '年龄'
)comment '学生表'
partitioned by(dt string comment '分区字段,格式yyyymmdd')
stored as parquet
TBLPROPERTIES('parquet.compression'='SNAPPY');

其中dt为分区，往学生表新增一个分区，并插入记录测试。

insert overwrite table test.student partition (dt='20220112') select user_id, '小爱',7 from test.table_name limit 10

2.新增两个字段 class 、grade 并插入数据

alter table test.student add columns(class string);
alter table test.student add columns(grade string);

insert overwrite table test.student partition (dt='20220112') select user_id, '小爱',7,'1班','一年级' from test.table_name limit 10

3.查询数据

select * from test.student where dt ='20220112'

hive sql 中某字段不为空命令 hive建表字段设置不为空_hive

发现刚新增的class grade 字段显示都为NULL，并不是我们期望。

但impala和presto 两种引擎查询是能够正常显示的。

4.再往表新增'20220113'分区

insert overwrite table test.student partition (dt='20220113') select user_id, '小爱',7,'1班','一年级' from test.table_name limit 10

5.再查询这个分区

select * from test.student where dt ='20220113'

hive sql 中某字段不为空命令 hive建表字段设置不为空_数据库_02

发现查询

select * from test.student where dt ='20220112'

还是依旧为NULL

由此我们可以得出这样一个结论

分区在增加字段前存在，新增字段值为NULL的情况
分区在增加字段前不存在，正常

3.解决方案

1.删除分区或者重新建表

这种情况分区较多亦或是数据量较大，都不推荐使用。

2.针对分区执行

对于在增加字段前已经存在的分区，需要再执行

alter table test.student partition(dt='20220112') add columns(grade string);
alter table test.student partition(dt='20220112') add columns(class string);

我们再来看看'20220112'分区字段class和grade显示是否正常

select * from test.student where dt ='20220112'

hive sql 中某字段不为空命令 hive建表字段设置不为空_数据库_03

从结果我们可以看到，已经正常显示了。

3.在往表添加字段时加上cascade

第二种方案，要是我们表里有很多分区，这样处理就显得有些繁琐了，不知有没有更优雅的处理方式，答案是肯定的，那就是在修改列时加上cascade

alter table test.student add columns (`number` string ) cascade;

insert overwrite table test.student partition (dt='20220113') select user_id, '小爱',7,'1班','一年级','N202209010101' from test.table_name limit 10

select * from test.student where dt ='20220113'

hive sql 中某字段不为空命令 hive建表字段设置不为空_java_04

总结：

1.对于在增加字段前已经存在的分区，需要再执行

alter table test.student partition(dt='20220112') add columns(column_name string);

2.在往表添加字段时加上cascade

alter table test.student add columns (column_name string ) cascade;

个人觉得第二种解决方案操作比第一种要方便得多。推荐使用。

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。

上一篇：hive 用户分析 hive用户接口

下一篇：html和html5常用标签及属性 html里的标签有哪些

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯