hive 可以修改外部表的存储格式吗 hive修改外部表路径

转载

数据小香 2023-07-14 11:57:26

文章标签 hive 可以修改外部表的存储格式吗 hive hdfs hadoop 数据 文章分类 Hive 大数据

1.我们上次学到的都是内部表，必须在数据库内进行使用。今天我们学习建外表操作：

（1）在hdfs上创建一个空目录：hdfs dfs -mkdir /t1_emp

（2）将其他内容数据导入该目录里：hdfs dfs -put ~/salary.txt /t1_emp

（3）在hive中创建一个表，与以前创建不同的是，最后一行的路径，写刚创建的空目录：

CREATE EXTERNAL TABLE `emp_external`(
  `id` int, 
  `name` string, 
  `job` string, 
  `birth` string, 
  `salary` int, 
  `dep` int)
ROW FORMAT DELIMITED 
  FIELDS TERMINATED BY ',' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'hdfs://localhost:9000/t1_emp'

（4）查看新建外表内容：select * from emp_external;

2.表数据的一些修改操作

（1）修改表名：alter table emp rename to emp_new;

（2）新增一列：alter table emp_new add columns (address string);

（3）改变某列表名字和数据类型：alter table emp_new change column id empid int;

（4）将列表替换成其他属性：alter table emp_new replace columns (_id int,_name string,_address string);

（5）增加分区：alter table emp_part add partition (day_id='20220509');

或者alter table emp_partition add partition (day_id='20220510') location 'hdfs://localhost:9000/t3emp_part/day_id=20220510';

（6）删除分区：alter table emp_part drop if exists partiton(day_id='20220509');

（7）删除表：drop table if exist emp_new;

（8）查看分区数据：select * from emp_part;(若为空，则表示里面的分区数据为空或者不存在）

3.创建分区的外表

（1）在hdfs上创建一个空目录：hdfs dfs -mkdir /t3_emp_partition

（2）在hive中创建一个表，与以前创建不同的是，中间需要加入 PARTITIONED BY (day_id string)，表示通过day_id的不同来进行分区，而且最后一行的路径，写刚创建的空目录：

create external table emp_part(
    > `id` int, 
    >   `name` string, 
    >   `job` string, 
    >   `birth` string, 
    >   `salary` int, 
    >   `dep` int)
    > PARTITIONED BY (day_id string)
    > ROW FORMAT DELIMITED 
    >   FIELDS TERMINATED BY ',' 
    > STORED AS INPUTFORMAT 
    >   'org.apache.hadoop.mapred.TextInputFormat' 
    > OUTPUTFORMAT 
    >   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
    > LOCATION
    >   'hdfs://localhost:9000/t3_emp_partition';

（3）插入分区数据： LOAD DATA LOCAL INPATH '/home/chang/salary.txt' OVERWRITE INTO TABLE emp_part partition (day_id='20220509');

（4）dfs -ls /t3_emp_partition;来进行查看分区数据列表。为方便操作，可以添加多个区，便于数据的操作。

hive 可以修改外部表的存储格式吗 hive修改外部表路径_hadoop

4.外部表与内部表的区别

（1）表数据存储位置不同：内部表默认存储在hive的/user/hive/warehouse下，外部表数据存储位置由自己指定；

（2）删除表时，内部表会删除真实数据，而外部表只会删除表数据，真实数据不会被影响。

（3）创建表时，外部表需要添加external关键字。

5.使用数组array

1.创建有关数组集合的表hive_array;

原数据如下：

hive 可以修改外部表的存储格式吗 hive修改外部表路径_hadoop_02

create table hive_array(
name string,
work_locations array<string>)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' 
COLLECTION ITEMS TERMINATED BY ',';  
'\t'表示不同列的区分
','表示集合之内的分割用逗号。
集合就是代表array这个复杂数据类型里边的数据之间的分割。

2.插入数据

使用load data当地文件导入到表hive_array内

load data local inpath '/home/hadoop/Desktop/hive_array.txt'  overwrite into table hive_array;

3.查看表信息

（1）用select * from hive_array;查看数据；

（2）desc hive_array;查看表信息。

hive 可以修改外部表的存储格式吗 hive修改外部表路径_hive_03

（3）取集合指定的数据；例都取第一列；

select work_locations[0] from hive_array;

hive 可以修改外部表的存储格式吗 hive修改外部表路径_hdfs_04

（4）取包含‘shanghai’的所有数据。

select * from hive_array where array_contains(work_locations,'shanghai');

hive 可以修改外部表的存储格式吗 hive修改外部表路径_hadoop_05

6.数据类型map

（1）创建数据类型是map的表.

原数据如下:

hive 可以修改外部表的存储格式吗 hive修改外部表路径_hadoop_06

create table hive_map(
id int,
name string,
members map<string,string>,
age int)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','--字段之间用逗号分隔
COLLECTION ITEMS TERMINATED BY '#'--集合也就是map字段中的数据key：values相互之间用#分割
MAP KEYS TERMINATED BY ':';
--map中key与values之间用冒号分割

（2）插入数据

load data local inpath '/home/chang/data/hive_map.txt' overwrite into table hive_map;
Loading data to table t3.hive_map

（3）查看数据

select * from hive_map;

hive 可以修改外部表的存储格式吗 hive修改外部表路径_hive 可以修改外部表的存储格式吗_07

7.数据类型struct

（1）创建数据类型是struct的表.

原数据如下:

hive 可以修改外部表的存储格式吗 hive修改外部表路径_hdfs_08

create table hive_struct(
    > ip string,
    > userinfo struct<name:string,age:int>)
    > ROW FORMAT DELIMITED FIELDS TERMINATED BY '#'  --字段之间的分割用#
    > COLLECTION ITEMS TERMINATED BY ':';   --集合之间的分割用冒号