注:hive其他语法在hive官网有说明,建议初学者,去官网学习一手的资料, 
官网:https://cwiki.apache.org/confluence/display/Hive/Home#Home-UserDocumentation

官网说明
Hive建表方式共有三种:

直接建表法
查询建表法
like建表法

首先看官网介绍 

CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name    -- (Note: TEMPORARY available in Hive 0.14.0 and later)
  [(col_name data_type [COMMENT col_comment], ... [constraint_specification])]
  [COMMENT table_comment]
  [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
  [CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
  [SKEWED BY (col_name, col_name, ...)                  -- (Note: Available in Hive 0.10.0 and later)]
     ON ((col_value, col_value, ...), (col_value, col_value, ...), ...)
     [STORED AS DIRECTORIES]
  [
   [ROW FORMAT row_format] 
   [STORED AS file_format]
     | STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)]  -- (Note: Available in Hive 0.6.0 and later)
  ]
  [LOCATION hdfs_path]
  [TBLPROPERTIES (property_name=property_value, ...)]   -- (Note: Available in Hive 0.6.0 and later)
 
 
 
[AS select_statement];   -- (Note: Available in Hive 0.5.0 and later; not supported for external tables)
 
 
 
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
  LIKE existing_table_or_view_name
  [LOCATION hdfs_path];
 
data_type
  : primitive_type
  | array_type
  | map_type
  | struct_type
  | union_type  -- (Note: Available in Hive 0.7.0 and later)
 
primitive_type
  : TINYINT
  | SMALLINT
  | INT
  | BIGINT
  | BOOLEAN
  | FLOAT
  | DOUBLE
  | DOUBLE PRECISION -- (Note: Available in Hive 2.2.0 and later)
  | STRING
  | BINARY      -- (Note: Available in Hive 0.8.0 and later)
  | TIMESTAMP   -- (Note: Available in Hive 0.8.0 and later)
  | DECIMAL     -- (Note: Available in Hive 0.11.0 and later)
  | DECIMAL(precision, scale)  -- (Note: Available in Hive 0.13.0 and later)
  | DATE        -- (Note: Available in Hive 0.12.0 and later)
  | VARCHAR     -- (Note: Available in Hive 0.12.0 and later)
  | CHAR        -- (Note: Available in Hive 0.13.0 and later)
 
array_type
  : ARRAY < data_type >
 
map_type
  : MAP < primitive_type, data_type >
 
struct_type
  : STRUCT < col_name : data_type [COMMENT col_comment], ...>
 
union_type
   : UNIONTYPE < data_type, data_type, ... >  -- (Note: Available in Hive 0.7.0 and later)
 
row_format
  : DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]] [COLLECTION ITEMS TERMINATED BY char]
        [MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
        [NULL DEFINED AS char]   -- (Note: Available in Hive 0.13 and later)
  | SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, ...)]
 
file_format:
  : SEQUENCEFILE
  | TEXTFILE    -- (Default, depending on hive.default.fileformat configuration)
  | RCFILE      -- (Note: Available in Hive 0.6.0 and later)
  | ORC         -- (Note: Available in Hive 0.11.0 and later)
  | PARQUET     -- (Note: Available in Hive 0.13.0 and later)
  | AVRO        -- (Note: Available in Hive 0.14.0 and later)
  | INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname
 
constraint_specification:
  : [, PRIMARY KEY (col_name, ...) DISABLE NOVALIDATE ]
    [, CONSTRAINT constraint_name FOREIGN KEY (col_name, ...) REFERENCES table_name(col_name, ...) DISABLE NOVALIDATE




一、hive的string和varchar的区别
 简要介绍:
Hive有2种类型用于存储变长文本。
1.Hive-0.12.0版本引入了VARCHAR类型,VARCHAR类型使用长度指示器(1到65355)创建,长度指示器定义了在字符串中允许的最大字符数量。如果一个字符串值转换为或者被赋予一个varchar值,其长度超过了长度指示器则该字符串值会自动被截断。
2.STRING存储变长的文本,对长度没有限制。理论上将STRING可以存储的大小为2GB,但是存储特别大的对象时效率可能受到影响,可以考虑使用Sqoop提供的大对象支持。
二、两者主要区别:
1.VARCHAR与STRING类似,但是STRING存储变长的文本,对长度没有限制;varchar长度上只允许在1-65355之间。
2.还没有通用的UDF可以直接用于VARCHAR类型,可以使用String UDF代替,VARCHAR将会转换为String再传递给UDF。


观察可发现一共有三种建表方式,接下来我们将一一讲解。

1.直接建表法:
create table table_name(col_name data_type);



一个复杂的例子

主要要按照上面的定义的格式顺序去进行编写

CREATE EXTERNAL TABLE IF NOT EXISTS `dmp_clearlog` (
  `date_log` string COMMENT 'date in file', 
  `hour` int COMMENT 'hour', 
  `device_id` string COMMENT '(android) md5 imei / (ios) origin  mac', 
  `imei_orgin` string COMMENT 'origin value of imei', 
  `mac_orgin` string COMMENT 'origin value of mac', 
  `mac_md5` string COMMENT 'mac after md5 encrypt', 
  `android_id` string COMMENT 'androidid', 
  `os` string  COMMENT 'operating system', 
  `ip` string COMMENT 'remote real ip', 
  `app` string COMMENT 'appname' )
COMMENT 'cleared log of origin log'
PARTITIONED BY (
  `date` date COMMENT 'date used by partition'
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' 
TBLPROPERTIES ('creator'='szh', 'crate_time'='2018-06-07')
;


这里我们针对里面的一些不同于关系型数据库的地方进行说明。
row_format
  : DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]] [COLLECTION ITEMS TERMINATED BY char]
        [MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
        [NULL DEFINED AS char]   -- (Note: Available in Hive 0.13 and later)
  | SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, ...)]

Hive将HDFS上的文件映射成表结构,通过分隔符来区分列(比如’,’ ‘;’ or ‘^’ 等),row format就是用于指定序列化和反序列化的规则。 
比如对于以下记录:
1,xiaoming,book-TV-code,beijing:chaoyang-shagnhai:pudong
2,lilei,book-code,nanjing:jiangning-taiwan:taibei
3,lihua,music-book,heilongjiang:haerbin

逗号用于分割列(FIELDS TERMINATED BY char:对应ID、name、hobby(数组形式,COLLECTION ITEMS TERMINATED BY char)、address(键值对形式map,MAP KEYS TERMINATED BY char)),而LINES TERMINATED BY char 用于区分不同条的数据,默认是换行符;


file format(HDFS文件存放的格式)
默认是TEXTFILE,即文本格式,可以直接打开。 

如下:根据上述文件内容,创建一个表t1
create table t1(
    id      int
   ,name    string
   ,hobby   array<string>
   ,add     map<String,string>
)
row format delimited
fields terminated by ','
collection items terminated by '-'
map keys terminated by ':'
;

下面插入数据 
注:一般很少用insert (不是insert overwrite)语句,因为就算就算插入一条数据,也会调用MapReduce,这里我们选择Load Data的方式。
LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]

然后上载
load data local inpath '/home/hadoop/Desktop/data' overwrite into table t1;


external
未被external修饰的是内部表(managed table),被external修饰的为外部表(external table); 
区别: 
内部表数据由Hive自身管理,外部表数据由HDFS管理; 
内部表数据存储的位置是hive.metastore.warehouse.dir(默认:/user/hive/warehouse),外部表数据的存储位置由自己制定; 
删除内部表会直接删除元数据(metadata)及存储数据;删除外部表仅仅会删除元数据,HDFS上的文件并不会被删除; 
对内部表的修改会将修改直接同步给元数据,而对外部表的表结构和分区进行修改,则需要修复(MSCK REPAIR TABLE table_name;)

创建一个外部表t2
create external table t2(
    id      int
   ,name    string
   ,hobby   array<string>
   ,add     map<String,string>
)
row format delimited
fields terminated by ','
collection items terminated by '-'
map keys terminated by ':'
location '/user/t2'
;


装载数据
load data local inpath '/home/hadoop/Desktop/data' overwrite into table t2;


2.查询建表法
通过AS 查询语句完成建表:将子查询的结果存在新表里,有数据 
一般用于中间表
CREATE TABLE new_key_value_store
   ROW FORMAT SERDE "org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe"
   STORED AS RCFile
   AS
SELECT (key % 1024) new_key, concat(key, value) key_value_pair
FROM key_value_store
SORT BY new_key, key_value_pair;

3.like建表法
会创建结构完全相同的表,但是没有数据。 
常用语中间表
CREATE TABLE empty_key_value_store
LIKE key_value_store;