HiveQL与标准SQL的区别:

陷阱1:

SELECT * 
FROM first_table t1
JOIN second_table t2
ON t1.id = t2.id
where t1.date = "2016-06-01"

在hive里面,没有SQL优化器,则这样些的后果是,直接将t1表与t2表全量连接,产生大量的MapReduce操作再进行过滤

正确写法:

SELECT * 
FROM first_table t1
JOIN second_table t2
ON t1.id = t2.id
and t1.date = "2016-06-01"