translate()函数详解
- 0-需求
- 1-数据分析
- 2-总结
0-需求
删除某个字段中不需要的字符,如某字段为字符串,我需要删除字符串中所有的元音字符,怎么删除?
这里举例:给出样例字符串"ABCDEFGHIOUUFEBCAADDEEII",我需要删除该字符串中的元音字符,也就是出现在该字段中包含"AEIOU"任意一个字符都需要删除掉。
1-数据分析
该需求的难点是需要去匹配"AEIOU"中的每一个字符,匹配出来后再进行删除,而每一个字符串中却包含了元音字符的任意组合,呈现出一定的无规律性,感觉无从下手,面对这种无规律性,存在随机组合的匹配模式的删除,我们采用数据库中提供的字符替换函数translate()函数。
translate()函数用法如下:
hive> desc function extended translate;
OK
translate(input, from, to) - translates the input string by replacing the characters present in the from string with the corresponding characters in the to string
translate(string input, string from, string to) is an equivalent function to translate in PostGreSQL. It works on a character by character basis on the input string (first parameter). A character in the input is checked for presence in the from string (second parameter). If a match happens, the character from to string (third parameter) which appears at the same index as the character in from string is obtained. This character is emitted in the output string instead of the original character from the input string. If the to string is shorter than the from string, there may not be a character present at the same index in the to string. In such a case, nothing is emitted for the original character and it's deleted from the output string.
翻译 translate函数是和postgresql中的translate函数一样的功能,它会逐个的检查input字符串每个字
符,如果在input中的某个字符在from的字符串中存在,则替换为在to字符串中的相同位置下的字符。
如果to字符串比from短,那么在from字符串中的某个字符可能在to字符串中找不到相应的字符,那么就会在原
字符串input中删除这个字符
For example,
translate('abcdef', 'adc', '19') returns '1b9ef' replacing 'a' with '1', 'd' with '9' and removing 'c' from the input string
translate('a b c d', ' ', '') return 'abcd' removing all spaces from the input string
If the same character is present multiple times in the input string, the first occurence of the character is the one that's considered for matching. However, it is not recommended to have the same character more than once in the from string since it's not required and adds to confusion.
For example,
translate('abcdef', 'ada', '192') returns '1bc9ef' replaces 'a' with '1' and 'd' with '9' ignoring the second occurence of 'a' in the from string mapping it to '2'
Time taken: 0.018 seconds, Fetched: 13 row(s)举例如下:
0: jdbc:hive2://10.9.4.117:10000> select translate('abcdef', 'adc', '19');
+--------+--+
| _c0 |
+--------+--+
| 1b9ef |
+--------+--+
1 row selected (0.862 seconds)上述代码解释:输入字符串为’abcdef’,其中需要被替换的字符为’adc’,此时在输入字符串中检查是否存在adc字符,如果存在则字符a用字符1代替,字符d则用字符9代替,字符c在to位置上为空,因此用空代替,其余的则保持原样输出,最终的输出结果为:1b9ef。
具体含义:
translate(input, from, to)
- input:输入字符串【集是要被替换的字符串】
- from:需要匹配的字符【即需要被替换的字符】,这里一定要注意是字符不是字符串
- to :用哪些字符来替换被匹配到的字符
- 注意点:这里from的字符与to字符在位置上存在一 一对应关系,也就是from中每个位置上的字符用to中对应位置的字符替换。
使用规则:
- 如果 from 字符串长度=to的字符串长度,如translate(“abcdef-abcdef”,“abcdef”,“123456”);替换不是说把"abcdef"替换成"123456",而是把a替换成1,把b替换成2,把c替换成3,把d替换成4,e替换成5,f替换成6。
0: jdbc:hive2://10.9.4.117:10000> select translate("abcdef-abcdef","abcdef","123456");
+----------------+--+
| _c0 |
+----------------+--+
| 123456-123456 |
+----------------+--+
1 row selected (0.525 seconds)select TRANSLATE('abcdef-abcdef','abcd','1234'),--1234ef-1234ef
TRANSLATE('abcdef-abcdef','ab','12'), --12cdef-12cdef
TRANSLATE('abcdef-abcdef','ad','14'), --1bc4ef-1bc4ef
TRANSLATE('abcdef-abcdef','da','41') --1bc4ef-1bc4ef
0: jdbc:hive2://10.9.4.117:10000> select TRANSLATE('abcdef-abcdef','abcd','1234'),--1234ef-1234ef
. . . . . . . . . . . . . . . . > TRANSLATE('abcdef-abcdef','ab','12'), --12cdef-12cdef
. . . . . . . . . . . . . . . . > TRANSLATE('abcdef-abcdef','ad','14'), --1bc4ef-1bc4ef
. . . . . . . . . . . . . . . . > TRANSLATE('abcdef-abcdef','da','41') --1bc4ef-1bc4ef;
+----------------+----------------+----------------+----------------+--+
| _c0 | _c1 | _c2 | _c3 |
+----------------+----------------+----------------+----------------+--+
| 1234ef-1234ef | 12cdef-12cdef | 1bc4ef-1bc4ef | 1bc4ef-1bc4ef |
+----------------+----------------+----------------+----------------+--+
1 row selected (0.961 seconds)2.如果 from 字符串长度>to的字符串长度 ,例如TRANSLATE(‘abcdef-abcdef’,‘adbc’,‘123’) 意思是把 a替换为1,b替换为2,c替换为3,d替换为空,即删除掉。
select TRANSLATE('abcdef-abcdef','abcd','123'), --123ef-123ef
TRANSLATE('abcdef-abcdef','adbc','123') --132ef-132ef
+--------------+--------------+--+
| _c0 | _c1 |
+--------------+--------------+--+
| 123ef-123ef | 132ef-132ef |
+--------------+--------------+--+3.如果 from里有重复字符 比如abca,1231,重复的字符a对应to的替换不会起作用
如下所示:
0: jdbc:hive2://10.9.4.117:10000> select TRANSLATE ('abcdaabbaaabbb','aa','12')--1bcd11bb111bbb ;
+-----------------+--+
| _c0 |
+-----------------+--+
| 1bcd11bb111bbb |
+-----------------+--+
1 row selected (0.936 seconds)4 from长度<to的长度,不报错但是to里面长的字符没有意义。
select TRANSLATE ('abcdaabbaaabbb','a','123')--1bcd11bb111bbb+-----------------+--+
| _c0 |
+-----------------+--+
| 1bcd11bb111bbb |
+-----------------+--+针对本题 我们用translate()函数如下:
select translate("ABCDEFGHIOUUFEBCAADDEEII","AEIOU","")计算结果如下:
+--------------+--+
| _c0 |
+--------------+--+
| BCDFGHFBCDD |
+--------------+--+
1 row selected (0.193 seconds)如果上述不太理解的同学也可以用如下方法:
select replace(translate("ABCDEFGHIOUUFEBCAADDEEII","AEIOU","aaaaa"),'a','')解释:上述表示将需要删除的字符AEIOU先用字符a代替,然后利用replace()函数删除
结果如下:
+--------------+--+
| _c0 |
+--------------+--+
| BCDFGHFBCDD |
+--------------+--+注意translate()与replace()替换的区别:
- translate():为字符替换只要字符在输入串中匹配到就会将该字符按照to中对应的字符替换。from 与to:位置上一一对应,单个字符替换
- replace():是严格按照from指定的字符串替换成to中的字符串。from与to:没有位置上对应关系,就是整体串上的替换
2-总结
本文讲解了如何用HQL删除字符串中多余的字符的技巧和方法,该功能主要采用translate()函数进行实现,文章中对该函数的用法进行了详细的分析和研究,并对该函数与replace()的区别进行了分析,本文的技巧对一些非指定或固定顺序的字符删除上特别有用。
















