hive资料收集

精选转载

xdark 2014-03-20 17:41:03 博主文章分类：hadoop和nosql

文章标签 资料 文章分类 Hive 大数据

1.官方文档：https://cwiki.apache.org/confluence/display/Hive/LanguageManual

2.分区，桶：http://blog.csdn.net/wisgood/article/details/17186107

3.基本命令：http://hi.baidu.com/7636553/item/61d3ee1b5c27e0663f87ce37

3.shell中传递变量： hive -e "sql where year=${env:year}";

4.对set条件进行过滤 hive -S -e "set" |grep header;

5.显示列名 set hive.cli.print.header=true;

6.创建视图，如果是Null，则会没有这个字段，一般要设置默认值

7.hive在提示符出现前，会执行home目录下的.hiverc文件，一般把启动时执行的命令放入该文件中。

8.分隔符字段ctrA \001,struct和array ctrB \002,map ctrc \003

9.当对多个join连接时，连接键相同，则只产生一个mr

10.连续查询中的表的大小，应该是从左到右依次增加的

11.

select * from a where a.i in (select i from b)

替换

select * from a left semi-join b on a.xx=b.xx;

返回左表中的记录，满足on条件。

12.join类型

INNER JOIN 内连接,

LEFT OUTER JOIN 左外连接,

RIGHT OUTER JOIN 右外连接,

FULL OUTER JOIN 全连接,

LEFT SEMI-JOIN 坐半外连接,

JOIN 笛卡尔积

13.查看压缩文件dfs -cat

14.从sequenct file中去除文件头和压缩，dfs -text xx

15.表生成函数：

select explode(array(1,2,3)) as ele from src;

作为中间列：

select name,sub

from employee

lateral view explode(subordinates) subView as sub;

16.udf/generic udf

add har /fullpath

create temporary function xx as 'org.package'

drop temporary function if exists xx

public class UDFZodiacSign extends UDF {
    private SimpleDateFormat df;
    public UDFZodiacSign() {
        df = new SimpleDateFormat("yyyy-MM-dd");
    }
    public String evaluate(Date bday) {
        return this.evaluate(bday.getMonth(), bday.getDay());
    }
    public String evaluate(int month, int day) {
        if (month == 1) {
            return "baiyang";
        }
        return null;
    }
}