数据操作篇

加载数据(会生成partition,如果不存在的话;local字段表示为是本机目录,如果不加,则表示为hdfs上的目录;overwrite关键字表示删除目标目录,当没有则保留,但会覆盖同名旧目录)一些限制条件:使用partition,则table必须是partition的;hive不会去验证你的data是服务schema的,但是hive会验证文件是否符合表定义,如sequencefile的table,file也必须是sequence files

load data local inpath '/home/hadoop/hive/data'

overwrite into table employees

partition(country='china',state='beijing');

静态插入数据:

插入数据(select * from employee 这样选择一个分区表,会将其partition的column也返回)

insert overwrite table employees

partition (country='US',state='dallas')

select name,salary,subordinate,deductions,address from employees se

where se.country='china' and se.state='beijing'

插入多条数据

FROM staged_employees se

INSERT OVERWRITE TABLE employees

PARTITION (country = 'US', state = 'OR')

SELECT * WHERE se.cnty = 'US' AND se.st = 'OR'

INSERT OVERWRITE TABLE employees

PARTITION (country = 'US', state = 'CA')

SELECT * WHERE se.cnty = 'US' AND se.st = 'CA'

INSERT OVERWRITE TABLE employees

PARTITION (country = 'US', state = 'IL')

SELECT * WHERE se.cnty = 'US' AND se.st = 'IL';

动态插入并创建partition:

INSERT OVERWRITE TABLE employees

PARTITION (country, state)

SELECT ..., se.cnty, se.st

FROM staged_employees se;

动态静态创建partition

INSERT OVERWRITE TABLE employees

PARTITION (country = 'US', state)

SELECT ..., se.cnty, se.st

FROM staged_employees se

WHERE se.cnty = 'US';

 

有一些参数控制,动态创建partition

hive.exec.dynamic.partition

false 

Set to true to enable dynamic partitioning.

 

hive.exec.dynamic.partition.mode

strict 

Set to nonstrict to enable all partitions to be determined

dynamically.

 

hive.exec.max.dynamic.partitions.pernode

100 

The maximum number of dynamic partitions that can be created

by each mapper or reducer. Raises a fatal error if one

mapper or reducer attempts to create more than the threshold.

 

hive.exec.max.dynamic.partitions

+1000 

The total number of dynamic partitions that can be created by

one statement with dynamic partitioning. Raises a fatal error

if the limit is exceeded.

 

hive.exec.max.created.files

100000 

The maximum total number of files that can be created globally.

A Hadoop counter is used to track the number of files created.

Raises a fatal error if the limit is exceeded.

 

创建表,并加载数据

CREATE TABLE ca_employees

AS SELECT name, salary, address

FROM employees

WHERE se.state = 'CA';

 

导出数据

INSERT OVERWRITE LOCAL DIRECTORY '/tmp/ca_employees'

SELECT name, salary, address

FROM employees

WHERE se.state = 'CA';

 

批量导出数据

FROM staged_employees se

INSERT OVERWRITE DIRECTORY '/tmp/or_employees'

SELECT * WHERE se.cty = 'US' and se.st = 'OR'

INSERT OVERWRITE DIRECTORY '/tmp/ca_employees'

SELECT * WHERE se.cty = 'US' and se.st = 'CA'

INSERT OVERWRITE DIRECTORY '/tmp/il_employees'

SELECT * WHERE se.cty = 'US' and se.st = 'IL';