一、Hive中“一次使用的命令”

1. -S表示静默模式,结果去掉OK和Time taken等行

hive -e "select * from movie_table limit 3"
...
OK
movieId title   genres
1       Toy Story (1995)        Adventure|Animation|Children|Comedy|Fantasy
2       Jumanji (1995)  Adventure|Children|Fantasy
Time taken: 11.631 seconds, Fetched: 3 row(s)


[root@master hive-1.2.2]# hive -S -e "select * from movie_table limit 3" 
...
movieId title   genres
1       Toy Story (1995)        Adventure|Animation|Children|Comedy|Fantasy
2       Jumanji (1995)  Adventure|Children|Fantasy

2.将查询结果输出到本地文件test.txt中(非HDFS)               test.txt是程序执行时自己创建的,无需提前创建

[root@master hive-1.2.2]# hive -S -e "select * from movie_table limit 3" > /usr/local/src/test3/hive/test.txt

(py27tf) [root@master hive-1.2.2]# cat /usr/local/src/test3/hive/test.txt
movieId title   genres
1       Toy Story (1995)        Adventure|Animation|Children|Comedy|Fantasy
2       Jumanji (1995)  Adventure|Children|Fantasy

通过以下命令查询管理表的warehouse属性记录(我也还没搞懂,再看看书P35)

[root@master hive-1.2.2]# hive  -e "seT" | grep warehouse

hive.metastore.warehouse.dir=/user/hive/warehouse
hive.warehouse.subdir.inherit.perms=true

 

 

3.文件执行hive查询          shell:source         终端使用hive  -f 得到的结果相同

hive>  source /usr/local/src/test3/select.sql;
OK
movieId title   genres
1       Toy Story (1995)        Adventure|Animation|Children|Comedy|Fantasy
2       Jumanji (1995)  Adventure|Children|Fantasy
Time taken: 2.976 seconds, Fetched: 3 row(s)


[root@master test3]# hive -f /usr/local/src/test3/select.sql 
...
OK
movieId title   genres
1       Toy Story (1995)        Adventure|Animation|Children|Comedy|Fantasy
2       Jumanji (1995)  Adventure|Children|Fantasy
Time taken: 5.915 seconds, Fetched: 3 row(s)

表里写入数据

1.建表
create table test(line string);
OK
Time taken: 0.82 seconds
2.插数据
[root@master hive-1.2.2]# hive -e "load data local inpath '/usr/local/src/test3/test.txt' into table test"
或者
hive> load data local inpath '/usr/local/src/test3/test.txt' overwrite into table test;

3.查询
hive> select * from test;
OK
one row
Time taken: 0.4 seconds, Fetched: 1 row(s)

执行shell命令

在命令前加上! 并以;结尾

hive> ! pwd;
/usr/local/src/hive-1.2.2
hive> ! echo "what up dog";
"what up dog"

 

在hive内使用hadoop命令   hadoop dfs -ls /  只需要去掉hadoop即可

在hive内使用hadoop命令的优点:hadoop中每次都会启动一个新的JVM实例,而hive在同一个进程中执行

hive> dfs -ls / ;
Found 8 items
drwxr-xr-x   - root supergroup          0 2019-03-04 09:41 /7
-rw-r--r--   3 root supergroup     632207 2018-12-13 14:01 /The_Man_of_Property.txt
-rw-r--r--   3 root supergroup        698 2019-05-13 18:32 /a8a
drwxr-xr-x   - root supergroup          0 2019-05-27 10:29 /hbase
drwxr-xr-x   - root supergroup          0 2019-05-27 14:47 /hbase_test
drwxr-xr-x   - root supergroup          0 2019-05-26 16:49 /hive
drwx-wx-wx   - root supergroup          0 2019-05-26 10:18 /tmp
drwxr-xr-x   - root supergroup          0 2019-05-26 10:43 /user

Hive脚本注释      使用--开头的字符串表示注释

Tips:CLI不能解析注释,会产生报错信息,只能在脚本中通过hive -f **.hql 的方式执行

--copyright (c) 2012 Megacorp,LLC.
--This is the best Hive script evar!!

select * from table;

显示字段名称(默认是关闭的):    (可在$HOME/.HIVRRC文件中配置修改为默认开启:set hive.cli.print.header=true;)

字段名称为:movie_table.movieid     movie_table.title       movie_table.genres   

hive> select * from movie_table limit 2;
OK
movieId title   genres
1       Toy Story (1995)        Adventure|Animation|Children|Comedy|Fantasy
Time taken: 0.159 seconds, Fetched: 2 row(s)

hive> set hive.cli.print.header=true;

hive> select * from movie_table limit 2;
OK
movie_table.movieid     movie_table.title       movie_table.genres
movieId title   genres
1       Toy Story (1995)        Adventure|Animation|Children|Comedy|Fantasy
Time taken: 0.148 seconds, Fetched: 2 row(s)

Chapter 4      HiveQL:数据定义(创建、修改、删除数据库、表、视图、函数和索引)

1.Hive不支持行级插入操作、更新操作和删除操作。Hive也不支持事物。

2.Hive中数据库的概念本质上仅仅只是表的一个目录或者命名空间

3.创建数据库financials

hive> create database financials;
OK
Time taken: 0.114 seconds

hive> show databases;
OK
default
financials
Time taken: 0.036 seconds, Fetched: 2 row(s)

如果数据库financials已经存在,创建时就会报错,以下命令可以不抛出错误信息;

hive> create database if not exists financials;

用正则表达式匹配筛选数据库名(列举出所有f开头的数据库)

hive> show databases like 'f*';
OK
financials
Time taken: 0.051 seconds, Fetched: 1 row(s)

Hive为每一个数据库创建一个目录(目录名以*.db结尾),数据库中的表将会以这个数据库目录的子目录形式存储(default除外,因为default库本身没有自己的目录)           

修改数据库目录:

1.配置文件修改

数据库的HDFS目录存储由hive-site.xml文件配置 :   set  hive.metastore.warehouse.dir=/user/hive/warehouse(这个目录是配置项默认配置,也可自己修改存储目录)

[root@master Programming_Hive]# hadoop fs -ls /user/hive/warehouse
Found 2 items
drwxr-xr-x   - root supergroup          0 2019-05-28 10:49 /user/hive/warehouse/financials.db
drwxr-xr-x   - root supergroup          0 2019-05-28 10:50 /user/hive/warehouse/human_resources.db

2.CLI修改数据库目录:(只修改当前数据库位置)   还可以在CLI增加数据库的描述

hive> create database change22
    > comment 'holds all financial tables'
    > location '/hive_test';
OK
Time taken: 0.043 seconds

hive> desc database change22;
OK
change22        holds all financial tables      hdfs://master:9000/hive_test    root    USER
Time taken: 0.038 seconds, Fetched: 1 row(s)

为数据库增加何其相关的键—值对属性信息,

查询时使用语句:desc database extended change;

hive> create database change
    > with dbproperties ("creator" = "Jason Chan","data"="2019-05-28");
OK
Time taken: 0.109 seconds

hive> desc database extended change;
OK
change          hdfs://master:9000/user/hive/warehouse/change.db        root    USER    {creator=Jason Chan, data=2019-05-28}
Time taken: 0.02 seconds, Fetched: 1 row(s)

修改或者新增数据库属性(不能删除)

hive> desc database extended change;
change          hdfs://master:9000/user/hive/warehouse/change.db        root    USER    {creator=Jason Chan, data=2019-05-28}

hive> alter database change set dbproperties ("creator"="jason");

hive> desc database extended change;
change          hdfs://master:9000/user/hive/warehouse/change.db        root    USER    {creator=jason, data=2019-05-28}

hive> alter database change set dbproperties ("edited-by"="Joe");

hive> desc database extended change;
change          hdfs://master:9000/user/hive/warehouse/change.db        root    USER    {creator=jason, data=2019-05-28, edited-by=Joe}

 

 

使用某个数据库:use

hive> show databases;
OK
default
financials
Time taken: 0.033 seconds, Fetched: 2 row(s)

hive> use financials;
OK
Time taken: 0.065 seconds

查询当前使用的数据库(当前的表属于哪个数据库):set hive.cli.print.current.db=true;

hive> set hive.cli.print.current.db=true;

hive (financials)> use default;
OK
Time taken: 0.032 seconds

hive (default)> set hive.cli.print.current.db=false;

删除数据库:drop

hive> drop database if exists human_resources;
OK
Time taken: 0.078 seconds

Hive不允许删除有表的数据库

hive> use traffic;
OK
Time taken: 0.039 seconds
    //traffic数据库下面有表
hive> show tables;
OK
monitor_camera_info
monitor_flow_action
Time taken: 0.05 seconds, Fetched: 2 row(s)
    //删除traffic数据库会报错
hive> drop database traffic;
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. InvalidOperationException(message:Database traffic is not empty. One or more tables exist.)

解决方案:

1.先清空数据库下的表:

//删除表
hive> drop table monitor_camera_info;
OK
Time taken: 0.132 seconds
hive> drop table monitor_flow_action;
OK
Time taken: 0.236 seconds
    //此时再删除数据库就可以了
hive> drop database traffic;
OK
Time taken: 0.108 seconds

2.使用关键字: cascade

//数据库traffic下有表
hive> show tables;
OK
monitor_camera_info
monitor_flow_action
Time taken: 0.06 seconds, Fetched: 2 row(s)

    //正常删库报错
hive> drop database if exists traffic; 
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. InvalidOperationException(message:Database traffic is not empty. One or more tables exist.)

    //使用关键字restrict也不能删除数据库    
hive> drop database if exists traffic restrict; 
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. InvalidOperationException(message:Database traffic is not empty. One or more tables exist.)

    //使用关键字cascade可以正常删除
hive> drop database if exists traffic cascade; 
OK
Time taken: 0.25 seconds

创建表(change数据库下创建employees表),在每个字段类型后面追加一个注释

TBLPROPERTIES作用:按键值对的格式为表增加额外的文档说明 (还可作为表示关于数据库连接的必要的元数据信息)      P53

Hive会自动增加两个属性:

1.last_modified_by,保存最后修改这个表的用户的用户名

2.last_modified_time,保存着最后一次修改的新纪元时间秒

//建表sql语句
create table if not exists change.employees3 (
    name            string  comment "employee name",

    salary          float   comment "employee salary",

    subordinates    array<string>   comment "names of subordinates",

    deductions      map<string,float>
                    comment "keys are deductions names, values are percentages",

    address         struct<street:string,city:string,state:string,zip:int>
                    comment "home address")
            
comment "description of the table"
TBLPROPERTIES ('creator'='me','created_at'='2019-05-28')
;


    //查看结果

hive> desc  employees3;
OK
name                    string                  employee name       
salary                  float                   employee salary     
subordinates            array<string>           names of subordinates
deductions              map<string,float>       keys are deductions names, values are percentages
address                 struct<street:string,city:string,state:string,zip:int>  home address        
Time taken: 0.116 seconds, Fetched: 5 row(s)

为表本身添加注释,自定义一个或多个表属性

怎么查看表的描述:comment "description of the table"   ???      show tblproperties employees3;

hive> show tblproperties employees3;
OK
comment description of the table
created_at      2019-05-28
creator me
transient_lastDdlTime   1559027367
Time taken: 0.128 seconds, Fetched: 4 row(s)

 

 

 

 

用户还可以拷贝一张已经存在的表的模式(不拷贝数据):   在change数据库下创建表employees4(要求表模式与employees3相同)

创建employee4.3_2.sql文件输入下面语句,然后执行hive   -f   employee4.3_2.sql

[root@master Programming_Hive]# cat employee4.3_2.sql 
create table if not exists change.employees4
like change.employees3;

[root@master Programming_Hive]# hive  -f  employee4.3_2.sql

查看表employees4的属性

hive> show tables;
OK
employees3
employees4
Time taken: 0.081 seconds, Fetched: 2 row(s)
hive> desc employees4;
OK
name                    string                  employee name       
salary                  float                   employee salary     
subordinates            array<string>           names of subordinates
deductions              map<string,float>       keys are deductions names, values are percentages
address                 struct<street:string,city:string,state:string,zip:int>  home address        
Time taken: 0.136 seconds, Fetched: 5 row(s)

 

 

 

 

 

在default数据库下查看change数据库的表         show tables in change;

hive> set hive.cli.print.current.db=true;
hive (default)> show tables in change;
OK
employees3
employees4
Time taken: 0.051 seconds, Fetched: 2 row(s)
hive (default)> set hive.cli.print.current.db=false;
hive>

使用正则表达式查询表:          show  tables   'empl*';     查询empl开头的所有的表

hive> show tables "empl*";
OK
employees3
employees4
Time taken: 0.034 seconds, Fetched: 2 row(s)

 

 

在数据库default下查找数据库change的表的结构信息。desc extended change.employees3;    实际我们更倾向于使用FORMATTED,因为输出内容更详细,且可读性较好

hive (default)> desc extended change.employees3;
OK
name                    string                  employee name       
salary                  float                   employee salary     
subordinates            array<string>           names of subordinates
deductions              map<string,float>       keys are deductions names, values are percentages
address                 struct<street:string,city:string,state:string,zip:int>  home address        
                 
Detailed Table Information      Table(tableName:employees3, dbName:change, owner:root
...
location:hdfs://master:9000/user/hive/warehouse/change.db/employees3,  
...
parameters:{creator=me, transient_lastDdlTime=1559027367, created_at=2019-05-28, comment=description of the table}, ...)
Time taken: 0.091 seconds, Fetched: 7 row(s)

查看表的某一列信息:salary列    (加不加extended结果是一样的)

hive (change)> describe employees3.salary;
OK
salary                  float                   from deserializer   
Time taken: 0.117 seconds, Fetched: 1 row(s)

 

 

4.3.2外部表

创建外部表,读取HDFS目录:/hive/programming_hive/data/stocks下所有的文件

删除外表,并不会删掉这个表中的数据,只是删除描述表的元数据

[root@master Programming_Hive]# cat 4.3.2stock.sql                                              create external table if not exists stocks (
`exchange` string,
`symbol` string,
`ymd` string,
`price_open` float,
`price_high` float,
`price_low` float,
`price_close` float,
`volume` int,
`price_adj_close` float)
row format delimited fields terminated by ','
location '/hive/programming_hive/data/stocks';

exchange 字段为hive保留字段,修改之后即可。如果非要用这个字段名,就用反引号(Tab上面的键)引起来:

严格来说:Hive是管理着管理表和外部表的的目录和文件,但是并没有对表具有完全的控制权限

 

查看表是管理表还是外部表:desc  formatted movie_table

hive (default)> desc FORMATTED movie_table;
...  
Table Type:             EXTERNAL_TABLE                    
...    
hive (default)> desc FORMATTED jason;
...    
Table Type:             MANAGED_TABLE            
...

创建外部表,但是可以复制内部表的结构(不复制数据)    employees5外部表,employees3内部表

        如果employees为外部表,external可省略,复制的表依然为外部表

[root@master Programming_Hive]# cat 4.3.2external_table.sql 
create table if not exists change.employees5 
like change.employees3
location '/hive/programming_hive/test';

[root@master Programming_Hive]# hive -f 4.3.2external_table.sql 
...
OK
Time taken: 1.939 seconds

 查看结构employees3和employees5一样,但employees5表的类型为external

hive (change)> desc formatted employees5;
OK
# col_name              data_type               comment             
                 
name                    string                  employee name       
salary                  float                   employee salary     
subordinates            array<string>           names of subordinates
deductions              map<string,float>       keys are deductions names, values are percentages
address                 struct<street:string,city:string,state:string,zip:int>  home address        
...                   
Table Type:             EXTERNAL_TABLE

 

4.4  分区表、管理表

如果表中的数据以及分区个数都非常大的话,执行一个包含所有分区的查询可能会触发一个巨大的MapReduce任务。

建议:将Hive设置为strict模式,(对分区表进行查询而where子句没有加分区过滤器,将会禁止提交这个任务)
 

hive> set hive.mapred.mode=strict;

hive> select e.name,e.salary from employees e limit 10;

FAILED:Error in semantic analysis: No partition predicate found for Alias  "e" Table  "employees"

hive> set hive.mapred.mode=nonstrict;

hive> select e.name,e.salary from employees e limit 10;

John  Doe    10000.0  

...

 

查看分区:show  partitions 

hive (default)> show partitions rating_table_p;
OK
dt=2008-03
dt=2008-08
Time taken: 0.136 seconds, Fetched: 2 row(s)

查看分区键的方法:desc extended rating_table_p;                         partitionKeys分区键

hive (default)> desc extended rating_table_p;
OK
userid                  string                                      
movieid                 string                                      
rating                  string                                      
dt                      string                                      
                 
# Partition Information          
# col_name              data_type               comment             
                 
dt                      string                                      
                 
partitionKeys:[FieldSchema(name:dt, type:string, comment:null)]

分区:

1.建表的时候创建分区

2.加载数据的时候创建分区

 

 

4.4.1