hive表 snappy压缩如何可视化 hive文件压缩

转载
编程梦想实现家 2024-05-17 15:39:03
文章标签 大数据 hive hadoop apache 文章分类 Hive 大数据
hive上可以使用多种格式，比如纯文本，lzo、orc等，为了搞清楚它们之间的关系，特意做个测试。

一、建立样例表
hive> create table tbl( id int, name string ) row format delimited fields terminated by '|' stored as textfile;
OK
Time taken: 0.338 seconds

hive> load data local inpath '/home/grid/users.txt' into table tbl;
Copying data from file:/home/grid/users.txt
Copying file: file:/home/grid/users.txt
Loading data to table default.tbl
Table default.tbl stats: [numFiles=1, numRows=0, totalSize=111, rawDataSize=0]
OK
Time taken: 0.567 seconds

hive> select * from tbl;
OK
1       Awyp
2       Azs
3       Als
4       Aww
5       Awyp2
6       Awyp3
7       Awyp4
8       Awyp5
9       Awyp6
10      Awyp7
11      Awyp8
12      Awyp5
13      Awyp9
14      Awyp20
Time taken: 0.237 seconds, Fetched: 14 row(s)
 
二、测试写入
1、无压缩
hive> set hive.exec.compress.output;
hive.exec.compress.output=false

hive>
>
> create table tbltxt as select * from tbl;
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1498527794024_0001, Tracking URL = http://hadoop1:8088/proxy/application_1498527794024_0001/
Kill Command = /opt/hadoop/bin/hadoop job  -kill job_1498527794024_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2017-06-27 10:55:29,906 Stage-1 map = 0%,  reduce = 0%
2017-06-27 10:55:39,532 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.66 sec
MapReduce Total cumulative CPU time: 2 seconds 660 msec
Ended Job = job_1498527794024_0001
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://hadoop1:9000/tmp/hive-grid/hive_2017-06-27_10-55-18_962_2187345348997213497-1/-ext-10001
Moving data to: hdfs://hadoop1:9000/user/hive/warehouse/tbltxt
Table default.tbltxt stats: [numFiles=1, numRows=14, totalSize=111, rawDataSize=97]
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1   Cumulative CPU: 2.66 sec   HDFS Read: 318 HDFS Write: 181 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 660 msec
OK
Time taken: 22.056 seconds

hive>
> show create table tbltxt;
OK
CREATE  TABLE `tbltxt`(
`id` int,
`name` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs://hadoop1:9000/user/hive/warehouse/tbltxt'
TBLPROPERTIES (
'COLUMN_STATS_ACCURATE'='true',
'numFiles'='1',
'numRows'='14',
'rawDataSize'='97',
'totalSize'='111',
'transient_lastDdlTime'='1498532140')
Time taken: 0.202 seconds, Fetched: 18 row(s)

hive>
>
> select * from tbltxt;
OK
1       Awyp
2       Azs
3       Als
4       Aww
5       Awyp2
6       Awyp3
7       Awyp4
8       Awyp5
9       Awyp6
10      Awyp7
11      Awyp8
12      Awyp5
13      Awyp9
14      Awyp20
Time taken: 0.059 seconds, Fetched: 14 row(s)

hive>
>
> dfs -ls /user/hive/warehouse/tbltxt;
Found 1 items
-rwxr-xr-x   1 grid supergroup        111 2017-06-27 10:55 /user/hive/warehouse/tbltxt/000000_0

hive>
>
> dfs -cat /user/hive/warehouse/tbltxt/000000_0;
1Awyp
2Azs
3Als
4Aww
5Awyp2
6Awyp3
7Awyp4
8Awyp5
9Awyp6
10Awyp7
11Awyp8
12Awyp5
13Awyp9
14Awyp20

读取和写入的格式为：
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
数据可以正常读出，数据格式为纯文本，可以直接用cat查看

2、使用压缩，格式为默认的压缩
hive>
> set hive.exec.compress.output=true;
hive>
>
> set mapred.output.compression.codec;
mapred.output.compression.codec=org.apache.hadoop.io.compress.DefaultCodec

可见当前压缩格式为默认的DefaultCodec。

hive>
> create table tbldefault as select * from tbl;
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1498527794024_0002, Tracking URL = http://hadoop1:8088/proxy/application_1498527794024_0002/
Kill Command = /opt/hadoop/bin/hadoop job  -kill job_1498527794024_0002
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2017-06-27 11:14:44,845 Stage-1 map = 0%,  reduce = 0%
2017-06-27 11:14:48,964 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.08 sec
MapReduce Total cumulative CPU time: 1 seconds 80 msec
Ended Job = job_1498527794024_0002
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://hadoop1:9000/tmp/hive-grid/hive_2017-06-27_11-14-39_351_6035948930260680086-1/-ext-10001
Moving data to: hdfs://hadoop1:9000/user/hive/warehouse/tbldefault
Table default.tbldefault stats: [numFiles=1, numRows=14, totalSize=76, rawDataSize=97]
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1   Cumulative CPU: 1.08 sec   HDFS Read: 318 HDFS Write: 150 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 80 msec
OK
Time taken: 10.842 seconds

hive>
>
> show create table tbldefault;
OK
CREATE  TABLE `tbldefault`(
`id` int,
`name` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs://hadoop1:9000/user/hive/warehouse/tbldefault'
TBLPROPERTIES (
'COLUMN_STATS_ACCURATE'='true',
'numFiles'='1',
'numRows'='14',
'rawDataSize'='97',
'totalSize'='76',
'transient_lastDdlTime'='1498533290')
Time taken: 0.044 seconds, Fetched: 18 row(s)

hive>
>
> select * from tbldefault;
OK
1       Awyp
2       Azs
3       Als
4       Aww
5       Awyp2
6       Awyp3
7       Awyp4
8       Awyp5
9       Awyp6
10      Awyp7
11      Awyp8
12      Awyp5
13      Awyp9
14      Awyp20
Time taken: 0.037 seconds, Fetched: 14 row(s)

hive>
>
> dfs -ls /user/hive/warehouse/tbldefault;
Found 1 items
-rwxr-xr-x   1 grid supergroup         76 2017-06-27 11:14 /user/hive/warehouse/tbldefault/000000_0.deflate
hive>
> dfs -cat /user/hive/warehouse/tbldefault/000000_0.deflate;
xws
dfX0)60K:HBhive>
>
>
可见在默认压缩下，表的读写格式与txt一样，但数据文件是经过默认库压缩的，后缀名为deflate，用户无法直接查看内容。意味着org.apache.hadoop.mapred.TextInputFormat这种input可以根据后缀识别默认压缩，并读出内容。

3、lzo压缩
hive>
> set mapred.output.compression.codec=com.hadoop.compression.lzo.LzoCodec;

hive>
>
> create table tbllzo as select * from tbl;
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1498527794024_0003, Tracking URL = http://hadoop1:8088/proxy/application_1498527794024_0003/
Kill Command = /opt/hadoop/bin/hadoop job  -kill job_1498527794024_0003
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2017-06-27 11:29:08,436 Stage-1 map = 0%,  reduce = 0%
2017-06-27 11:29:14,638 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.87 sec
MapReduce Total cumulative CPU time: 1 seconds 870 msec
Ended Job = job_1498527794024_0003
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://hadoop1:9000/tmp/hive-grid/hive_2017-06-27_11-29-03_249_4340474818139134521-1/-ext-10001
Moving data to: hdfs://hadoop1:9000/user/hive/warehouse/tbllzo
Table default.tbllzo stats: [numFiles=1, numRows=14, totalSize=106, rawDataSize=97]
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1   Cumulative CPU: 1.87 sec   HDFS Read: 318 HDFS Write: 176 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 870 msec
OK
Time taken: 13.744 seconds

hive>
>
> show create table tbllzo;
OK
CREATE  TABLE `tbllzo`(
`id` int,
`name` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs://hadoop1:9000/user/hive/warehouse/tbllzo'
TBLPROPERTIES (
'COLUMN_STATS_ACCURATE'='true',
'numFiles'='1',
'numRows'='14',
'rawDataSize'='97',
'totalSize'='106',
'transient_lastDdlTime'='1498534156')
Time taken: 0.044 seconds, Fetched: 18 row(s)

hive>
> select * from tbllzo;
OK
1       Awyp
2       Azs
3       Als
4       Aww
5       Awyp2
6       Awyp3
7       Awyp4
8       Awyp5
9       Awyp6
10      Awyp7
11      Awyp8
12      Awyp5
13      Awyp9
14      Awyp20
Time taken: 0.032 seconds, Fetched: 14 row(s)

hive>
>
> dfs -ls /user/hive/warehouse/tbllzo;
Found 1 items
-rwxr-xr-x   1 grid supergroup        106 2017-06-27 11:29 /user/hive/warehouse/tbllzo/000000_0.lzo_deflate
hive>
>
> dfs -cat /user/hive/warehouse/tbllzo/000000_0.lzo_deflate;
ob1Awyp
2Azs
3Als
4Aww
5Awyp2
6
7
8
9
10
1
125
13Awyp9
14Awyp20

在lz压缩下，表的读写格式仍然是org.apache.hadoop.mapred.TextInputFormat，数据文件后缀名为.lzo_deflate，用户无法直接查看内容。也就是说，org.apache.hadoop.mapred.TextInputFormat这种input可以识别lzo压缩并读出内容。（真强大！）

4、lzop压缩
hive>
> set mapred.output.compression.codec=com.hadoop.compression.lzo.LzopCodec;

hive>
> create table tbllzop as select * from tbl;
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1498527794024_0004, Tracking URL = http://hadoop1:8088/proxy/application_1498527794024_0004/
Kill Command = /opt/hadoop/bin/hadoop job  -kill job_1498527794024_0004
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2017-06-27 11:37:28,010 Stage-1 map = 0%,  reduce = 0%
2017-06-27 11:37:32,127 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.1 sec
MapReduce Total cumulative CPU time: 2 seconds 100 msec
Ended Job = job_1498527794024_0004
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://hadoop1:9000/tmp/hive-grid/hive_2017-06-27_11-37-23_099_3493082162039010112-1/-ext-10001
Moving data to: hdfs://hadoop1:9000/user/hive/warehouse/tbllzop
Table default.tbllzop stats: [numFiles=1, numRows=14, totalSize=148, rawDataSize=97]
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1   Cumulative CPU: 2.1 sec   HDFS Read: 318 HDFS Write: 219 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 100 msec
OK
Time taken: 10.233 seconds

hive>
>
> show create table tbllzop;
OK
CREATE  TABLE `tbllzop`(
`id` int,
`name` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs://hadoop1:9000/user/hive/warehouse/tbllzop'
TBLPROPERTIES (
'COLUMN_STATS_ACCURATE'='true',
'numFiles'='1',
'numRows'='14',
'rawDataSize'='97',
'totalSize'='148',
'transient_lastDdlTime'='1498534653')
Time taken: 0.046 seconds, Fetched: 18 row(s)

hive>
>
>
> select * from tbllzop;
OK
1       Awyp
2       Azs
3       Als
4       Aww
5       Awyp2
6       Awyp3
7       Awyp4
8       Awyp5
9       Awyp6
10      Awyp7
11      Awyp8
12      Awyp5
13      Awyp9
14      Awyp20
Time taken: 0.033 seconds, Fetched: 14 row(s)

hive>
>
> dfs -ls /user/hive/warehouse/tbllzop;
Found 1 items
-rwxr-xr-x   1 grid supergroup        148 2017-06-27 11:37 /user/hive/warehouse/tbllzop/000000_0.lzo
hive>
>
> dfs -cat /user/hive/warehouse/tbllzop/000000_0.lzo;
          ob1Awyp
2Azs
3Als
4Aww
5Awyp2
6
7
8
9
10
1
125
13Awyp9
14Awyp20

同样，在lzop压缩下，表的读写格式仍然是org.apache.hadoop.mapred.TextInputFormat，数据文件后缀名为.lzo，用户无法直接查看内容。org.apache.hadoop.mapred.TextInputFormat可以识别lzop压缩并读出内容


从以上几种情况可以看出，不管使用哪种压缩，在hive看来都属于纯文本（只是使用了不同方法压缩而已），使用org.apache.hadoop.mapred.TextInputFormat都可以读取，而且hive在插入时只会根据mapred.output.compression.codec来压缩（而不会管表定义的inputFormat是什么）。以下可以验证一下：

1、set mapred.output.compression.codec=com.hadoop.compression.lzo.LzopCodec时插入数据，数据文件是lzop的压缩，且可以正常读出。

hive> set mapred.output.compression.codec=com.hadoop.compression.lzo.LzopCodec;

hive>
> create table tbltest1( id int, name string )
> stored as inputformat 'org.apache.hadoop.mapred.TextInputFormat'
> outputformat 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';
OK
Time taken: 0.493 seconds

hive>
> insert into table tbltest1 select * from tbl;
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1498660018952_0001, Tracking URL = http://hadoop1:8088/proxy/application_1498660018952_0001/
Kill Command = /opt/hadoop/bin/hadoop job  -kill job_1498660018952_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2017-06-28 22:59:27,886 Stage-1 map = 0%,  reduce = 0%
2017-06-28 22:59:36,427 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.25 sec
MapReduce Total cumulative CPU time: 2 seconds 250 msec
Ended Job = job_1498660018952_0001
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://hadoop1:9000/tmp/hive-grid/hive_2017-06-28_22-59-14_730_4437480099583255943-1/-ext-10000
Loading data to table default.tbltest1
Table default.tbltest1 stats: [numFiles=1, numRows=14, totalSize=148, rawDataSize=97]
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1   Cumulative CPU: 2.25 sec   HDFS Read: 318 HDFS Write: 220 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 250 msec
OK
Time taken: 24.151 seconds

hive>
> dfs -ls /user/hive/warehouse/tbltest1;
Found 1 items
-rwxr-xr-x   1 grid supergroup        148 2017-06-28 22:59 /user/hive/warehouse/tbltest1/000000_0.lzo

hive>
> select * from tbltest1;
OK
1       Awyp
2       Azs
3       Als
4       Aww
5       Awyp2
6       Awyp3
7       Awyp4
8       Awyp5
9       Awyp6
10      Awyp7
11      Awyp8
12      Awyp5
13      Awyp9
14      Awyp20
Time taken: 0.055 seconds, Fetched: 14 row(s)

2、set mapred.output.compression.codec=org.apache.hadoop.io.compress.DefaultCodec时插入数据，数据文件是默认的压缩，且可以正常读出。

hive> set mapred.output.compression.codec=org.apache.hadoop.io.compress.DefaultCodec;

hive> create table tbltest2( id int, name string )
> stored as inputformat 'org.apache.hadoop.mapred.TextInputFormat'
> outputformat 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';
OK
Time taken: 0.142 seconds

hive> insert into table tbltest2 select * from tbl;
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1498660018952_0002, Tracking URL = http://hadoop1:8088/proxy/application_1498660018952_0002/
Kill Command = /opt/hadoop/bin/hadoop job  -kill job_1498660018952_0002
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2017-06-28 23:09:06,439 Stage-1 map = 0%,  reduce = 0%
2017-06-28 23:09:11,668 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.15 sec
MapReduce Total cumulative CPU time: 1 seconds 150 msec
Ended Job = job_1498660018952_0002
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://hadoop1:9000/tmp/hive-grid/hive_2017-06-28_23-09-01_674_9172062679713398655-1/-ext-10000
Loading data to table default.tbltest2
Table default.tbltest2 stats: [numFiles=1, numRows=14, totalSize=76, rawDataSize=97]
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1   Cumulative CPU: 1.15 sec   HDFS Read: 318 HDFS Write: 148 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 150 msec
OK
Time taken: 11.278 seconds

hive>
>
>
> dfs -ls /user/hive/warehouse/tbltest2;
Found 1 items
-rwxr-xr-x   1 grid supergroup         76 2017-06-28 23:09 /user/hive/warehouse/tbltest2/000000_0.deflate

hive>
> select * from tbltest2;
OK
1       Awyp
2       Azs
3       Als
4       Aww
5       Awyp2
6       Awyp3
7       Awyp4
8       Awyp5
9       Awyp6
10      Awyp7
11      Awyp8
12      Awyp5
13      Awyp9
14      Awyp20
Time taken: 0.035 seconds, Fetched: 14 row(s)

3、当表是orc格式时，会按照ORC格式进行压缩，不受mapred.output.compression.codec和hive.exec.compress.output影响。
hive>  set hive.exec.compress.output=false;
hive> create table tbltest3( id int, name string )
> stored as orc tblproperties("orc.compress"="SNAPPY");
OK
Time taken: 0.08 seconds

hive>  insert into table tbltest3 select * from tbl;
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1498660018952_0003, Tracking URL = http://hadoop1:8088/proxy/application_1498660018952_0003/
Kill Command = /opt/hadoop/bin/hadoop job  -kill job_1498660018952_0003
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2017-06-28 23:30:29,865 Stage-1 map = 0%,  reduce = 0%
2017-06-28 23:30:34,007 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.14 sec
MapReduce Total cumulative CPU time: 1 seconds 140 msec
Ended Job = job_1498660018952_0003
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://hadoop1:9000/tmp/hive-grid/hive_2017-06-28_23-30-25_350_7458831371800658041-1/-ext-10000
Loading data to table default.tbltest3
Table default.tbltest3 stats: [numFiles=1, numRows=14, totalSize=365, rawDataSize=1288]
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1   Cumulative CPU: 1.14 sec   HDFS Read: 318 HDFS Write: 439 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 140 msec
OK
Time taken: 9.963 seconds

hive> dfs -ls /user/hive/warehouse/tbltest3;
Found 1 items
-rwxr-xr-x   1 grid supergroup        365 2017-06-28 23:30 /user/hive/warehouse/tbltest3/000000_0

hive>
> dfs -cat /user/hive/warehouse/tbltest3/000000_0;
ORC
)
9
"
A+_Az_
+@DA+y-Az_A+_A++A+y-2345678,5A+y-9A+y-20
hive>
> show create table tbltest3;
OK
CREATE  TABLE `tbltest3`(
`id` int,
`name` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
'hdfs://hadoop1:9000/user/hive/warehouse/tbltest3'
TBLPROPERTIES (
'COLUMN_STATS_ACCURATE'='true',
'numFiles'='1',
'numRows'='14',
'orc.compress'='SNAPPY',
'rawDataSize'='1288',
'totalSize'='365',
'transient_lastDdlTime'='1498663835')
Time taken: 0.217 seconds, Fetched: 19 row(s)

hive>
> select * from tbltest3;
OK
1       Awyp
2       Azs
3       Als
4       Aww
5       Awyp2
6       Awyp3
7       Awyp4
8       Awyp5
9       Awyp6
10      Awyp7
11      Awyp8
12      Awyp5
13      Awyp9
14      Awyp20
Time taken: 0.689 seconds, Fetched: 14 row(s)
可见当orc格式时，插入数据并不受压缩参数的影响。而且inputformat和outputformat已经不再是text。
三、总结
1、不管是无压缩，还是默认压缩，还是lzo和lzop等格式，对hive来说都是文本格式，可以根据数据文件的后缀名自动识别，写入时根据参数决定是否压缩以及压缩成什么格式
2、orc对hive来说是另外一种格式，不管参数如何指定，都会按照建表语名指定的格式来读取和写入。
https://blog.51cto.com/bigdata1024/1942877
本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。