Hive中数据库的相关操作

数据库的创建、删除

hive (default)> create database if not exists wyTest;
OK
Time taken: 0.69 seconds
hive (default)> show databases;
OK
default
wytest
Time taken: 0.031 seconds, Fetched: 2 row(s)


删除数据库


hive (default)>
              >
              > drop database if exists wyTest;
OK
Time taken: 0.568 seconds

但是这里的数据库因为没有表在里面,所以可以直接删除,当数据库中含有表的时候,该命令不能直接进行删除,需要将表删除后再删除数据库。当然可以通过在命令后面拼接cascade来告诉Hive帮你去把内部的表删除。

hive (default)> drop database if exists wyTest cascade;
OK




Hive中表

首先创建一张工资表。在建立表的时候,如果说数据已经存在了,格式固定就需要根据表的数据去创建表的结构,而如果数据格式未确定的时候,可以根据自己的设计表,然后将数据按照设计的表的格式去组织数据文件。

hive> describe salaries;
OK
yearid                  int                     year
teamid                  string                  team
lgid                    string
salary                  int
Time taken: 0.138 seconds, Fetched: 4 row(s)
hive> describe extended salaries;
OK
yearid                  int                     year
teamid                  string                  team
lgid                    string
salary                  int

Detailed Table Information      Table(tableName:salaries, dbName:default, owner:root, createTime:147384090stAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:yearid, type:int, comment:year), Schema(name:teamid, type:string, comment:team), FieldSchema(name:lgid, type:string, comment:null), FieldScname:salary, type:int, comment:null)], location:hdfs://hadoopnodeservice1/user/hive/warehouse/salaries, inrmat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTexttFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadooe.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=,, field.delim=,}), bucketCols:[], sortCol parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}oredAsSubDirectories:false), partitionKeys:[], parameters:{numFiles=1, COLUMN_STATS_ACCURATE=true, transiestDdlTime=1473841185, numRows=0, totalSize=676848, rawDataSize=0}, viewOriginalText:null, viewExpandedText, tableType:MANAGED_TABLE)
Time taken: 0.121 seconds, Fetched: 6 row(s)



将本地的文件中的数据导入到Hive数据库的表中

hive> load data local inpath '/usr/local/dataFileTemp/Salaries.csv'
    > overwrite into table salaries;
Copying data from file:/usr/local/dataFileTemp/Salaries.csv
Copying file: file:/usr/local/dataFileTemp/Salaries.csv
Loading data to table default.salaries
rmr: DEPRECATED: Please use 'rm -r' instead.
Deleted hdfs://hadoopnodeservice1/user/hive/warehouse/salaries
Table default.salaries stats: [numFiles=1, numRows=0, totalSize=676848, rawDataSize=0]
OK
Time taken: 12.224 seconds



创建外部表

外部表主要的特点是,在Hive进行各种操作的时候,不会影响原来的数据本身,如果删除所有数据的时候,外部表中的数据不会被删除,但是Hive中的数据会被清空。

hive> create external table salaries_external(
    > yearId int comment 'year',
    > teamId string comment 'team',
    > lgId string,
    > salary int)
    > row format delimited
    > fields terminated by ','
    > location '/usr/local/dataFileTemp/Salaries.csv';
OK
Time taken: 0.731 seconds



分区

按照yearId进行分区,创建表的时候,自动按照yearId进行了分区,所以看到提示的信息中显示了分区已经存在。

hive> alter table salaries add partition(partCol='yearid') location '/usr/local/dataFileTemp/           /salaries/2016/09';
FAILED: SemanticException table is not partitioned but partition spec exists: {partcol=yearid



这里的分区,对于外部表一样可以进行分区的操作。


数据库表中信息变更


添加列

hive> alter table salaries add columns(
    > playerid string);
OK

现在各个列的次序依次为

hive>
    > describe salaries;
OK
yearid                  int                     year
teamid                  string                  team
lgid                    string
salary                  int
playerid                string
Time taken: 0.196 seconds, Fetched: 5 row(s)



更变列在表中的次序

把playerId 提前到salary字段的前面

hive> alter table salaries change column playerid playerid string before salary;
FAILED: ParseException line 1:60 missing EOF at 'before' near 'string'
hive> alter table salaries change column salary salary float after playerid;
OK
Time taken: 0.273 seconds



这里好像用before的话会报错的,所以我换成了after这样就没错了。

hive>
    > load data local inpath '/usr/local/dataFileTemp/Salaries.csv' overwrite into table salaries;
Copying data from file:/usr/local/dataFileTemp/Salaries.csv
Copying file: file:/usr/local/dataFileTemp/Salaries.csv
Loading data to table default.salaries
rmr: DEPRECATED: Please use 'rm -r' instead.
Deleted hdfs://hadoopnodeservice1/user/hive/warehouse/salaries
Table default.salaries stats: [numFiles=1, numRows=0, totalSize=676848, rawDataSize=0]
OK
Time taken: 1.239 seconds
hive> select * from salaries limit 10;
OK
1985    BAL     AL      murraed02       1472819.0
1985    BAL     AL      lynnfr01        1090000.0
1985    BAL     AL      ripkeca01       800000.0
1985    BAL     AL      lacyle01        725000.0
1985    BAL     AL      flanami01       641667.0
1985    BAL     AL      boddimi01       625000.0
1985    BAL     AL      stewasa01       581250.0
1985    BAL     AL      martide01       560000.0
1985    BAL     AL      roeniga01       558333.0
1985    BAL     AL      mcgresc01       547143.0
Time taken: 0.339 seconds, Fetched: 10 row(s)



替换表中列的话

hive> alter table salaries replace columns (
    > yearid int,
    > teamid string,
    > playerid string,
    > salary float);
OK
Time taken: 0.236 seconds



该语句会将原来的数据库中的列删除后再新建列。

hive> alter table salaries replace columns (
    > yearid int,
    > teamid string,
    > lgid string,
    > playerid string,
    > salary float);
OK
Time taken: 0.178 seconds
hive> select * from salaries limit 10;
OK
1985    BAL     AL      murraed02       1472819.0
1985    BAL     AL      lynnfr01        1090000.0
1985    BAL     AL      ripkeca01       800000.0
1985    BAL     AL      lacyle01        725000.0
1985    BAL     AL      flanami01       641667.0
1985    BAL     AL      boddimi01       625000.0
1985    BAL     AL      stewasa01       581250.0
1985    BAL     AL      martide01       560000.0
1985    BAL     AL      roeniga01       558333.0
1985    BAL     AL      mcgresc01       547143.0
Time taken: 0.148 seconds, Fetched: 10 row(s)



将表的数据列替换之后数据好像不会丢失的感觉。这里注意一下。

hive> alter table salaries replace columns (
    > yearid int);
OK
Time taken: 1.172 seconds
hive> select * from salaries limit 10;
OK
1985
1985
1985
1985
1985
1985
1985
1985
1985
1985
Time taken: 0.386 seconds, Fetched: 10 row(s)


然后将数据库的表中列进行替换,替换为原来的列信息,那么数据就又回来了。


hive> select * from salaries limit 10;
OK
1985    BAL     AL      murraed02       1472819.0
1985    BAL     AL      lynnfr01        1090000.0
1985    BAL     AL      ripkeca01       800000.0
1985    BAL     AL      lacyle01        725000.0
1985    BAL     AL      flanami01       641667.0
1985    BAL     AL      boddimi01       625000.0
1985    BAL     AL      stewasa01       581250.0
1985    BAL     AL      martide01       560000.0
1985    BAL     AL      roeniga01       558333.0
1985    BAL     AL      mcgresc01       547143.0
Time taken: 0.111 seconds, Fetched: 10 row(s)