HiveQL与标准SQL的区别:
陷阱1:
SELECT *
FROM first_table t1
JOIN second_table t2
ON t1.id = t2.id
where t1.date = "2016-06-01"
在hive里面,没有SQL优化器,则这样些的后果是,直接将t1表与t2表全量连接,产生大量的MapReduce操作再进行过滤
正确写法:
SELECT *
FROM first_table t1
JOIN second_table t2
ON t1.id = t2.id
and t1.date = "2016-06-01"