好程序员大数据学习路线hive内部函数

原创

wx5d42865f47214 2019-08-12 16:31:56 博主文章分类：大数据培训 ©著作权

©著作权归作者所有：来自51CTO博客作者wx5d42865f47214的原创作品，请联系作者获取转载授权，否则将追究法律责任

好程序员大数据学习路线hive内部函数，持续为大家更新了大数据学习路线，希望对正在学习大数据的小伙伴有所帮助。 1、取随机数函数：rand() 语法: rand(),rand(int seed) 返回值: double 说明: 返回一个0到1范围内的随机数。如果指定seed，则会得到一个稳定的随机数序列 select rand(); select rand(10); 2、分割字符串函数:split(str,splitor) 语法: split(string str, string pat) 返回值: array 说明: 按照pat字符串分割str，会返回分割后的字符串数组，注意特殊分割符的转义 select split(5.0,".")[0]; select split(rand(10)*100,".")[0]; 3、字符串截取函数：substr,substring 语法: substr(string A, int start),substring(string A, int start) 返回值: string 说明：返回字符串A从start位置到结尾的字符串语法: substr(string A, int start, int len),substring(string A, int start, int len) 返回值: string 说明：返回字符串A从start位置开始，长度为len的字符串 select substr(rand()100,0,2); select substring(rand()100,0,2); 4、If函数:if 语法: if(boolean testCondition, T valueTrue, T valueFalseOrNull) 返回值: T 说明: 当条件testCondition为TRUE时，返回valueTrue；否则返回valueFalseOrNull select if(100>10,"this is true","this is false"); select if(2=1,"男","女"); select if(1=1,"男",(if(1=2,"女","不知道"))); select if(3=1,"男",(if(3=2,"女","不知道"))); 5、条件判断函数：CASE 第一种格式：语法: CASE WHEN a THEN b [WHEN c THEN d] [ELSE e] END 返回值: T 说明：如果a为TRUE,则返回b；如果c为TRUE，则返回d；否则返回e 第二种格式：语法: CASE a WHEN b THEN c [WHEN d THEN e] [ELSE f] END 返回值: T 说明：如果a等于b，那么返回c；如果a等于d，那么返回e；否则返回f select case 6 when 1 then "100" when 2 then "200" when 3 then "300" when 4 then "400" else "others" end ; ##创建表 create table if not exists cw( flag int ) ; load data local inpath '/home/flag' into table cw; ##第一种格式 select case c.flag when 1 then "100" when 2 then "200" when 3 then "300" when 4 then "400" else "others" end from cw c ; ##第二种格式 select case when 1=c.flag then "100" when 2=c.flag then "200" when 3=c.flag then "300" when 4=c.flag then "400" else "others" end from cw c ; 6、正则表达式替换函数：regexp_replace 语法: regexpreplace(string A, string B, string C) 返回值: string 说明：将字符串A中的符合java正则表达式B的部分替换为C。注意，在有些情况下要使用转义字符,类似oracle中的regexpreplace函数 select regexp_replace("1.jsp",".jsp",".html"); 7、类型转换函数: cast 语法: cast(expr as ) 返回值: Expected "=" to follow "type" 说明: 返回转换后的数据类型 select 1; select cast(1 as double); select cast("12" as int); 8、字符串连接函数：concat；带分隔符字符串连接函数：concat_ws 语法: concat(string A, string B…) 返回值: string 说明：返回输入字符串连接后的结果，支持任意个输入字符串语法: concat_ws(string SEP, string A, string B…) 返回值: string 说明：返回输入字符串连接后的结果，SEP表示各个字符串间的分隔符 select "千峰" + 1603 + "班级"; select concat("千峰",1603,"班级"); select concat_ws("|","千峰","1603","班级"); 9、排名函数： rownumber(): 名次不并列 rank():名次并列，但空位 denserank():名次并列，但不空位 ##数据 id class score 1 1 90 2 1 85 3 1 87 4 1 60 5 2 82 6 2 70 7 2 67 8 2 88 9 2 93

1 1 90 1 3 1 87 2 2 1 85 3 9 2 93 1 8 2 88 2 5 2 82 3

create table if not exists uscore( uid int, classid int, score double ) row format delimited fields terminated by '\t' ; load data local inpath '/home/uscore' into table uscore; select u.uid, u.classid, u.score from uscore u group by u.classid,u.uid,u.score limit 3 ; select u.uid, u.classid, u.score, row_number() over(distribute by u.classid sort by u.score desc) rn from uscore u ; 取前三名 select t.uid, t.classid, t.score from ( select u.uid, u.classid, u.score, row_number() over(distribute by u.classid sort by u.score desc) rn from uscore u ) t where t.rn < 4 ; 查看三个排名区别 select u.uid, u.classid, u.score, row_number() over(distribute by u.classid sort by u.score desc) rn, rank() over(distribute by u.classid sort by u.score desc) rank, dense_rank() over(distribute by u.classid sort by u.score desc) dr from uscore u ; 10.聚合函数： min() max() count() count(distinct ) sum() avg() count(1):不管正行有没有值，只要出现就累计1 count(*):正行值只要有一个不为空就给类计1 count(col)：col列有值就累计1 count(distinct col)：col列有值并且不相同才累计1 11.null值操作几乎任何数和 NULL操作都返回NULL select 1+null; select 1/0; select null%2; 12.等值操作 select null=null; #null select null<=>null;#true