spark内置函数
文章目录
- spark内置函数
- 数值类
- 逻辑非
- 逻辑或
- 不等于
- 按位非
- 取余
- 按位与
- 乘
- 加
- 减
- 除
- 小于
- 小于等于
- 等于(<=>)
- 等于(=)
- 等于(==)
- 大于
- 大于等于
- 按位异或
- 按位或
- 绝对值
- 月份加
- 平均值
- 两数之间
- 四舍五入
- 阶乘
- 不大于给定值的最大整数
- 格式化数字格式
- 最大值
- 最大值(max)
- 最小值
- 最小值(min)
- 随机值(0-1)
- 均匀分布的随机值(0-1)
- 平方根
- 总体标准偏差
- 样本标准偏差
- 求和
- 数组类
- 是否在数组内
- 数组元素去重
- 数组差集
- 数组交集
- 数组连接
- 数组最大元素
- 数组最小元素
- 数组移除指定值
- 数组对象复制
- 数组排序
- 数组全连接
- 过滤
- 多矩阵转换为单矩阵
- 字符类
- 首字符ascii
- ASCII字符
- base64
- bigint
- 二进制
- binary
- 非空与
- 二进制数长度
- 字符串/字节长度
- 字符串串联
- 字符串串联(||)
- 指定分隔符的串联
- 查找
- 字符串格式化
- 根据给定字符串转换为csv格式
- 根据给定字符串转换为json格式
- 获取json中的某个值
- 哈希值
- 十六进制
- if
- ifnull
- in
- 驼峰字符串
- 子字符串索引
- isnan
- isnotnull
- isnull
- java 反射方法调用
- 字符串长度
- md5
- nvl
- 根据给定字符串重复n次
- 字符串替换
- 字符串/数组反转
- 字符串右侧n个字符
- 截取子字符串
- sha
- sha1
- sha2
- 字符串拆分
- 去除空格
- base64转字符串
- hex转字符串
- uuid
- 日期类
- 当前时间戳
- 当前时区
- 当前日期
- 日期(加)
- 日期格式化
- 日期天数(1970-01-01-至今)
- 指定的两个日期的天数
- 当前月的第几天
- 当前周的第几天
- 当前年的第几天
- 给定时间中获取年份
- 日期提取
- 根据给定格式返回unix时间
- 根据时间获取小时数
- 根据时间获取分钟数
- 根据时间获取月份
- 获取当前时间
- 微秒转时间戳
- 毫秒转时间戳
- 秒数转时间戳
- 解析为日期
- 解析为时间戳
- 解析为时间戳(unix)
- 解析为时间戳(utc)
- 类型转换类
- 转换为decimal
- 字符集编码
- 字符集解码
- 弧度转度
- 转double类型
- 转float类型
- 转int类型
- 转String类型
- 转为时间戳类型
- 转为tinyint类型
- 欧拉数
数值类
逻辑非
! expr - Logical not.
Examples:
> SELECT ! true;
false
> SELECT ! false;
true
> SELECT ! NULL;
NULL
逻辑或
expr1 or expr2 - Logical OR.
Examples:
> SELECT true or false;
true
> SELECT false or false;
false
> SELECT true or NULL;
true
> SELECT false or NULL;
NULL
不等于
expr1 != expr2 - Returns true if expr1
is not equal to expr2
, or false otherwise.
Arguments:
- expr1, expr2 - the two expressions must be same type or can be casted to a common type, and must be a type that can be used in equality comparison. Map type is not supported. For complex types such array/struct, the data types of fields must be orderable.
Examples:
> SELECT 1 != 2;
true
> SELECT 1 != '2';
true
> SELECT true != NULL;
NULL
> SELECT NULL != NULL;
NULL
按位非
~ expr - Returns the result of bitwise NOT of expr
.
Examples:
> SELECT ~ 0;
-1
取余
expr1 % expr2 - Returns the remainder after expr1
/expr2
.
Examples:
> SELECT 2 % 1.8;
0.2
> SELECT MOD(2, 1.8);
0.2
按位与
expr1 & expr2 - Returns the result of bitwise AND of expr1
and expr2
.
Examples:
> SELECT 3 & 5;
1
乘
expr1 * expr2 - Returns expr1
*expr2
.
Examples:
> SELECT 2 * 3;
6
加
expr1 + expr2 - Returns expr1
+expr2
.
Examples:
> SELECT 1 + 2;
3
减
expr1 - expr2 - Returns expr1
-expr2
.
Examples:
> SELECT 2 - 1;
1
除
expr1 / expr2 - Returns expr1
/expr2
. It always performs floating point division.
Examples:
> SELECT 3 / 2;
1.5
> SELECT 2L / 2L;
1.0
小于
expr1 < expr2 - Returns true if expr1
is less than expr2
.
Arguments:
- expr1, expr2 - the two expressions must be same type or can be casted to a common type, and must be a type that can be ordered. For example, map type is not orderable, so it is not supported. For complex types such array/struct, the data types of fields must be orderable.
Examples:
> SELECT 1 < 2;
true
> SELECT 1.1 < '1';
false
> SELECT to_date('2009-07-30 04:17:52') < to_date('2009-07-30 04:17:52');
false
> SELECT to_date('2009-07-30 04:17:52') < to_date('2009-08-01 04:17:52');
true
> SELECT 1 < NULL;
NULL
小于等于
expr1 <= expr2 - Returns true if expr1
is less than or equal to expr2
.
Arguments:
- expr1, expr2 - the two expressions must be same type or can be casted to a common type, and must be a type that can be ordered. For example, map type is not orderable, so it is not supported. For complex types such array/struct, the data types of fields must be orderable.
Examples:
> SELECT 2 <= 2;
true
> SELECT 1.0 <= '1';
true
> SELECT to_date('2009-07-30 04:17:52') <= to_date('2009-07-30 04:17:52');
true
> SELECT to_date('2009-07-30 04:17:52') <= to_date('2009-08-01 04:17:52');
true
> SELECT 1 <= NULL;
NULL
等于(<=>)
expr1 <=> expr2 - Returns same result as the EQUAL(=) operator for non-null operands, but returns true if both are null, false if one of the them is null.
Arguments:
- expr1, expr2 - the two expressions must be same type or can be casted to a common type, and must be a type that can be used in equality comparison. Map type is not supported. For complex types such array/struct, the data types of fields must be orderable.
Examples:
> SELECT 2 <=> 2;
true
> SELECT 1 <=> '1';
true
> SELECT true <=> NULL;
false
> SELECT NULL <=> NULL;
true
等于(=)
expr1 = expr2 - Returns true if expr1
equals expr2
, or false otherwise.
Arguments:
- expr1, expr2 - the two expressions must be same type or can be casted to a common type, and must be a type that can be used in equality comparison. Map type is not supported. For complex types such array/struct, the data types of fields must be orderable.
Examples:
> SELECT 2 = 2;
true
> SELECT 1 = '1';
true
> SELECT true = NULL;
NULL
> SELECT NULL = NULL;
NULL
等于(==)
expr1 == expr2 - Returns true if expr1
equals expr2
, or false otherwise.
Arguments:
- expr1, expr2 - the two expressions must be same type or can be casted to a common type, and must be a type that can be used in equality comparison. Map type is not supported. For complex types such array/struct, the data types of fields must be orderable.
Examples:
> SELECT 2 == 2;
true
> SELECT 1 == '1';
true
> SELECT true == NULL;
NULL
> SELECT NULL == NULL;
NULL
大于
expr1 > expr2 - Returns true if expr1
is greater than expr2
.
Arguments:
- expr1, expr2 - the two expressions must be same type or can be casted to a common type, and must be a type that can be ordered. For example, map type is not orderable, so it is not supported. For complex types such array/struct, the data types of fields must be orderable.
Examples:
> SELECT 2 > 1;
true
> SELECT 2 > '1.1';
true
> SELECT to_date('2009-07-30 04:17:52') > to_date('2009-07-30 04:17:52');
false
> SELECT to_date('2009-07-30 04:17:52') > to_date('2009-08-01 04:17:52');
false
> SELECT 1 > NULL;
NULL
大于等于
expr1 >= expr2 - Returns true if expr1
is greater than or equal to expr2
.
Arguments:
- expr1, expr2 - the two expressions must be same type or can be casted to a common type, and must be a type that can be ordered. For example, map type is not orderable, so it is not supported. For complex types such array/struct, the data types of fields must be orderable.
Examples:
> SELECT 2 >= 1;
true
> SELECT 2.0 >= '2.1';
false
> SELECT to_date('2009-07-30 04:17:52') >= to_date('2009-07-30 04:17:52');
true
> SELECT to_date('2009-07-30 04:17:52') >= to_date('2009-08-01 04:17:52');
false
> SELECT 1 >= NULL;
NULL
按位异或
expr1 ^ expr2 - Returns the result of bitwise exclusive OR of expr1
and expr2
.
Examples:
> SELECT 3 ^ 5;
6
按位或
expr1 | expr2 - Returns the result of bitwise OR of expr1
and expr2
.
Examples:
> SELECT 3 | 5;
7
绝对值
abs(expr) - Returns the absolute value of the numeric value.
Examples:
> SELECT abs(-1);
1
月份加
add_months(start_date, num_months) - Returns the date that is num_months
after start_date
.
Examples:
> SELECT add_months('2016-08-31', 1);
2016-09-30
平均值
avg(expr) - Returns the mean calculated from values of a group.
Examples:
> SELECT avg(col) FROM VALUES (1), (2), (3) AS tab(col);
2.0
> SELECT avg(col) FROM VALUES (1), (2), (NULL) AS tab(col);
1.5
两数之间
expr1 [NOT] BETWEEN expr2 AND expr3 - evaluate if expr1
is [not] in between expr2
and expr3
.
Examples:
> SELECT col1 FROM VALUES 1, 3, 5, 7 WHERE col1 BETWEEN 2 AND 5;
3
5
四舍五入
bround(expr, d) - Returns expr
rounded to d
decimal places using HALF_EVEN rounding mode.
Examples:
> SELECT bround(2.5, 0);
2
阶乘
factorial(expr) - Returns the factorial of expr
. expr
is [0…20]. Otherwise, null.
Examples:
> SELECT factorial(5);
120
不大于给定值的最大整数
floor(expr) - Returns the largest integer not greater than expr
.
Examples:
> SELECT floor(-0.1);
-1
> SELECT floor(5);
5
格式化数字格式
format_number(expr1, expr2) - Formats the number expr1
like ‘#,###,###.##’, rounded to expr2
decimal places. If expr2
is 0, the result has no decimal point or fractional part. expr2
also accept a user specified format. This is supposed to function like MySQL’s FORMAT.
Examples:
> SELECT format_number(12332.123456, 4);
12,332.1235
> SELECT format_number(12332.123456, '##################.###');
12332.123
最大值
greatest(expr, …) - Returns the greatest value of all parameters, skipping null values.
Examples:
> SELECT greatest(10, 9, 2, 4, 3);
10
最大值(max)
max(expr) - Returns the maximum value of expr
.
Examples:
> SELECT max(col) FROM VALUES (10), (50), (20) AS tab(col);
50
最小值
east(expr, …) - Returns the least value of all parameters, skipping null values.
Examples:
> SELECT least(10, 9, 2, 4, 3);
2
最小值(min)
min(expr) - Returns the minimum value of expr
.
Examples:
> SELECT min(col) FROM VALUES (10), (-1), (20) AS tab(col);
-1
随机值(0-1)
rand([seed]) - Returns a random value with independent and identically distributed (i.i.d.) uniformly distributed values in [0, 1).
Examples:
> SELECT rand();
0.9629742951434543
> SELECT rand(0);
0.8446490682263027
> SELECT rand(null);
0.8446490682263027
均匀分布的随机值(0-1)
random([seed]) - Returns a random value with independent and identically distributed (i.i.d.) uniformly distributed values in [0, 1).
Examples:
> SELECT random();
0.9629742951434543
> SELECT random(0);
0.8446490682263027
> SELECT random(null);
0.8446490682263027
平方根
sqrt(expr) - Returns the square root of expr
.
Examples:
> SELECT sqrt(4);
2.0
总体标准偏差
stddev_pop(expr) - Returns the population standard deviation calculated from values of a group.
Examples:
> SELECT stddev_pop(col) FROM VALUES (1), (2), (3) AS tab(col);
0.816496580927726
样本标准偏差
stddev_samp(expr) - Returns the sample standard deviation calculated from values of a group.
Examples:
> SELECT stddev_samp(col) FROM VALUES (1), (2), (3) AS tab(col);
1.0
求和
sum(expr) - Returns the sum calculated from values of a group.
Examples:
> SELECT sum(col) FROM VALUES (5), (10), (15) AS tab(col);
30
> SELECT sum(col) FROM VALUES (NULL), (10), (15) AS tab(col);
25
> SELECT sum(col) FROM VALUES (NULL), (NULL) AS tab(col);
NULL
数组类
是否在数组内
array_contains(array, value) - Returns true if the array contains the value.
Examples:
> SELECT array_contains(array(1, 2, 3), 2);
true
数组元素去重
array_distinct(array) - Removes duplicate values from the array.
Examples:
> SELECT array_distinct(array(1, 2, 3, null, 3));
[1,2,3,null]
数组差集
array_except(array1, array2) - Returns an array of the elements in array1 but not in array2, without duplicates.
Examples:
> SELECT array_except(array(1, 2, 3), array(1, 3, 5));
[2]
数组交集
array_intersect(array1, array2) - Returns an array of the elements in the intersection of array1 and array2, without duplicates.
Examples:
> SELECT array_intersect(array(1, 2, 3), array(1, 3, 5));
[1,3]
数组连接
array_join(array, delimiter[, nullReplacement]) - Concatenates the elements of the given array using the delimiter and an optional string to replace nulls. If no value is set for nullReplacement, any null value is filtered.
Examples:
> SELECT array_join(array('hello', 'world'), ' ');
hello world
> SELECT array_join(array('hello', null ,'world'), ' ');
hello world
> SELECT array_join(array('hello', null ,'world'), ' ', ',');
hello , world
数组最大元素
array_max(array) - Returns the maximum value in the array. NULL elements are skipped.
Examples:
> SELECT array_max(array(1, 20, null, 3));
20
数组最小元素
array_min(array) - Returns the minimum value in the array. NULL elements are skipped.
Examples:
> SELECT array_min(array(1, 20, null, 3));
1
数组移除指定值
array_remove(array, element) - Remove all elements that equal to element from array.
Examples:
> SELECT array_remove(array(1, 2, 3, null, 3), 3);
[1,2,null]
数组对象复制
array_repeat(element, count) - Returns the array containing element count times.
Examples:
> SELECT array_repeat('123', 2);
["123","123"]
数组排序
array_sort(expr, func) - Sorts the input array. If func is omitted, sort in ascending order. The elements of the input array must be orderable. Null elements will be placed at the end of the returned array. Since 3.0.0 this function also sorts and returns the array based on the given comparator function. The comparator will take two arguments representing two elements of the array. It returns -1, 0, or 1 as the first element is less than, equal to, or greater than the second element. If the comparator function returns other values (including null), the function will fail and raise an error.
Examples:
> SELECT array_sort(array(5, 6, 1), (left, right) -> case when left < right then -1 when left > right then 1 else 0 end);
[1,5,6]
> SELECT array_sort(array('bc', 'ab', 'dc'), (left, right) -> case when left is null and right is null then 0 when left is null then -1 when right is null then 1 when left < right then 1 when left > right then -1 else 0 end);
["dc","bc","ab"]
> SELECT array_sort(array('b', 'd', null, 'c', 'a'));
["a","b","c","d",null]
数组全连接
array_union(array1, array2) - Returns an array of the elements in the union of array1 and array2, without duplicates.
Examples:
> SELECT array_union(array(1, 2, 3), array(1, 3, 5));
[1,2,3,5]
过滤
filter(expr, func) - Filters the input array using the given predicate.
Examples:
> SELECT filter(array(1, 2, 3), x -> x % 2 == 1);
[1,3]
> SELECT filter(array(0, 2, 3), (x, i) -> x > i);
[2,3]
> SELECT filter(array(0, null, 2, 3, null), x -> x IS NOT NULL);
[0,2,3]
多矩阵转换为单矩阵
flatten(arrayOfArrays) - Transforms an array of arrays into a single array.
Examples:
> SELECT flatten(array(array(1, 2), array(3, 4)));
[1,2,3,4]
字符类
首字符ascii
ascii(str) - Returns the numeric value of the first character of str
.
Examples:
> SELECT ascii('222');
50
> SELECT ascii(2);
50
ASCII字符
char(expr) - Returns the ASCII character having the binary equivalent to expr
. If n is larger than 256 the result is equivalent to chr(n % 256)
Examples:
> SELECT char(65);
A
base64
base64(bin) - Converts the argument from a binary bin
to a base 64 string.
Examples:
> SELECT base64('Spark SQL');
U3BhcmsgU1FM
bigint
bigint(expr) - Casts the value expr
to the target data type bigint
.
二进制
bin(expr) - Returns the string representation of the long value expr
represented in binary.
Examples:
> SELECT bin(13);
1101
> SELECT bin(-13);
1111111111111111111111111111111111111111111111111111111111110011
> SELECT bin(13.3);
1101
binary
binary(expr) - Casts the value expr
to the target data type binary
.
非空与
bit_and(expr) - Returns the bitwise AND of all non-null input values, or null if none.
Examples:
> SELECT bit_and(col) FROM VALUES (3), (5) AS tab(col);
1
二进制数长度
bit_length(expr) - Returns the bit length of string data or number of bits of binary data.
Examples:
> SELECT bit_length('Spark SQL');
72
字符串/字节长度
char_length(expr) - Returns the character length of string data or number of bytes of binary data. The length of string data includes the trailing spaces. The length of binary data includes binary zeros.
character_length(expr) -同理
Examples:
> SELECT char_length('Spark SQL ');
10
> SELECT CHAR_LENGTH('Spark SQL ');
10
> SELECT CHARACTER_LENGTH('Spark SQL ');
10
字符串串联
concat(col1, col2, …, colN) - Returns the concatenation of col1, col2, …, colN.
Examples:
> SELECT concat('Spark', 'SQL');
SparkSQL
> SELECT concat(array(1, 2, 3), array(4, 5), array(6));
[1,2,3,4,5,6]
字符串串联(||)
expr1 || expr2 - Returns the concatenation of expr1
and expr2
.
Examples:
> SELECT 'Spark' || 'SQL';
SparkSQL
> SELECT array(1, 2, 3) || array(4, 5) || array(6);
[1,2,3,4,5,6]
指定分隔符的串联
concat_ws(sep[, str | array(str)]+) - Returns the concatenation of the strings separated by sep
.
Examples:
> SELECT concat_ws(' ', 'Spark', 'SQL');
Spark SQL
> SELECT concat_ws('s');
查找
find_in_set(str, str_array) - Returns the index (1-based) of the given string (str
) in the comma-delimited list (str_array
). Returns 0, if the string was not found or if the given string (str
) contains a comma.
Examples:
> SELECT find_in_set('ab','abc,b,ab,c,def');
3
字符串格式化
format_string(strfmt, obj, …) - Returns a formatted string from printf-style format strings.
Examples:
> SELECT format_string("Hello World %d %s", 100, "days");
Hello World 100 days
根据给定字符串转换为csv格式
rom_csv(csvStr, schema[, options]) - Returns a struct value with the given csvStr
and schema
.
Examples:
> SELECT from_csv('1, 0.8', 'a INT, b DOUBLE');
{"a":1,"b":0.8}
> SELECT from_csv('26/08/2015', 'time Timestamp', map('timestampFormat', 'dd/MM/yyyy'));
{"time":2015-08-26 00:00:00}
根据给定字符串转换为json格式
from_json(jsonStr, schema[, options]) - Returns a struct value with the given jsonStr
and schema
.
Examples:
> SELECT from_json('{"a":1, "b":0.8}', 'a INT, b DOUBLE');
{"a":1,"b":0.8}
> SELECT from_json('{"time":"26/08/2015"}', 'time Timestamp', map('timestampFormat', 'dd/MM/yyyy'));
{"time":2015-08-26 00:00:00}
获取json中的某个值
get_json_object(json_txt, path) - Extracts a json object from path
.
Examples:
> SELECT get_json_object('{"a":"b"}', '$.a');
b
哈希值
hash(expr1, expr2, …) - Returns a hash value of the arguments.
Examples:
> SELECT hash('Spark', array(123), 2);
-1321691492
十六进制
hex(expr) - Converts expr
to hexadecimal.
Examples:
> SELECT hex(17);
11
> SELECT hex('Spark SQL');
537061726B2053514C
if
if(expr1, expr2, expr3) - If expr1
evaluates to true, then returns expr2
; otherwise returns expr3
.
Examples:
> SELECT if(1 < 2, 'a', 'b');
a
ifnull
ifnull(expr1, expr2) - Returns expr2
if expr1
is null, or expr1
otherwise.
Examples:
> SELECT ifnull(NULL, array('2'));
["2"]
in
expr1 in(expr2, expr3, …) - Returns true if expr
equals to any valN.
Arguments:
- expr1, expr2, expr3, … - the arguments must be same type.
Examples:
> SELECT 1 in(1, 2, 3);
true
> SELECT 1 in(2, 3, 4);
false
> SELECT named_struct('a', 1, 'b', 2) in(named_struct('a', 1, 'b', 1), named_struct('a', 1, 'b', 3));
false
> SELECT named_struct('a', 1, 'b', 2) in(named_struct('a', 1, 'b', 2), named_struct('a', 1, 'b', 3));
true
驼峰字符串
initcap(str) - Returns str
with the first letter of each word in uppercase. All other letters are in lowercase. Words are delimited by white space.
Examples:
> SELECT initcap('sPark sql');
Spark Sql
子字符串索引
instr(str, substr) - Returns the (1-based) index of the first occurrence of substr
in str
.
Examples:
> SELECT instr('SparkSQL', 'SQL');
6
isnan
isnan(expr) - Returns true if expr
is NaN, or false otherwise.
Examples:
> SELECT isnan(cast('NaN' as double));
true
isnotnull
isnotnull(expr) - Returns true if expr
is not null, or false otherwise.
Examples:
> SELECT isnotnull(1);
true
isnull
isnull(expr) - Returns true if expr
is null, or false otherwise.
Examples:
> SELECT isnull(1);
false
java 反射方法调用
java_method(class, method[, arg1[, arg2 …]]) - Calls a method with reflection.
Examples:
> SELECT java_method('java.util.UUID', 'randomUUID');
c33fb387-8500-4bfa-81d2-6e0e3e930df2
> SELECT java_method('java.util.UUID', 'fromString', 'a5cf6c42-0c85-418f-af6c-3e4e5b1328f2');
a5cf6c42-0c85-418f-af6c-3e4e5b1328f2
字符串长度
length(expr) - Returns the character length of string data or number of bytes of binary data. The length of string data includes the trailing spaces. The length of binary data includes binary zeros.
Examples:
> SELECT length('Spark SQL ');
10
> SELECT CHAR_LENGTH('Spark SQL ');
10
> SELECT CHARACTER_LENGTH('Spark SQL ');
10
md5
md5(expr) - Returns an MD5 128-bit checksum as a hex string of expr
.
Examples:
> SELECT md5('Spark');
8cde774d6f7333752ed72cacddb05126
nvl
nvl(expr1, expr2) - Returns expr2
if expr1
is null, or expr1
otherwise.
Examples:
> SELECT nvl(NULL, array('2'));
["2"]
根据给定字符串重复n次
repeat(str, n) - Returns the string which repeats the given string value n times.
Examples:
> SELECT repeat('123', 2);
123123
字符串替换
replace(str, search[, replace]) - Replaces all occurrences of search
with replace
.
Arguments:
- str - a string expression
- search - a string expression. If
search
is not found instr
,str
is returned unchanged. - replace - a string expression. If
replace
is not specified or is an empty string, nothing replaces the string that is removed fromstr
.
Examples:
> SELECT replace('ABCabc', 'abc', 'DEF');
ABCDEF
字符串/数组反转
reverse(array) - Returns a reversed string or an array with reverse order of elements.
Examples:
> SELECT reverse('Spark SQL');
LQS krapS
> SELECT reverse(array(2, 1, 4, 3));
[3,4,1,2]
字符串右侧n个字符
right(str, len) - Returns the rightmost len
(len
can be string type) characters from the string str
,if len
is less or equal than 0 the result is an empty string.
Examples:
> SELECT right('Spark SQL', 3);
SQL
截取子字符串
substr(str, pos[, len]) - Returns the substring of str
that starts at pos
and is of length len
, or the slice of byte array that starts at pos
and is of length len
.
substr(str FROM pos[ FOR len]]) - Returns the substring of str
that starts at pos
and is of length len
, or the slice of byte array that starts at pos
and is of length len
.
Examples:
> SELECT substr('Spark SQL', 5);
k SQL
> SELECT substr('Spark SQL', -3);
SQL
> SELECT substr('Spark SQL', 5, 1);
k
> SELECT substr('Spark SQL' FROM 5);
k SQL
> SELECT substr('Spark SQL' FROM -3);
SQL
> SELECT substr('Spark SQL' FROM 5 FOR 1);
k
sha
sha(expr) - Returns a sha1 hash value as a hex string of the expr
.
Examples:
> SELECT sha('Spark');
85f5955f4b27a9a4c2aab6ffe5d7189fc298b92c
sha1
sha2(expr, bitLength) - Returns a checksum of SHA-2 family as a hex string of expr
. SHA-224, SHA-256, SHA-384, and SHA-512 are supported. Bit length of 0 is equivalent to 256.
Examples:
> SELECT sha2('Spark', 256);
529bc3b07127ecb7e53a4dcf1991d9152c24537d919178022b2c42657f79a26b
sha2
sha2(expr, bitLength) - Returns a checksum of SHA-2 family as a hex string of expr
. SHA-224, SHA-256, SHA-384, and SHA-512 are supported. Bit length of 0 is equivalent to 256.
Examples:
> SELECT sha2('Spark', 256);
529bc3b07127ecb7e53a4dcf1991d9152c24537d919178022b2c42657f79a26b
字符串拆分
split(str, regex, limit) - Splits str
around occurrences that match regex
and returns an array with a length of at most limit
Arguments:
- str - a string expression to split.
- regex - a string representing a regular expression. The regex string should be a Java regular expression.
- limit - an integer expression which controls the number of times the regex is applied.
- limit > 0: The resulting array’s length will not be more than
limit
, and the resulting array’s last entry will contain all input beyond the last matched regex. - limit <= 0:
regex
will be applied as many times as possible, and the resulting array can be of any size.
Examples:
> SELECT split('oneAtwoBthreeC', '[ABC]');
["one","two","three",""]
> SELECT split('oneAtwoBthreeC', '[ABC]', -1);
["one","two","three",""]
> SELECT split('oneAtwoBthreeC', '[ABC]', 2);
["one","twoBthreeC"]
去除空格
trim(str) - Removes the leading and trailing space characters from str
.
trim(BOTH FROM str) - Removes the leading and trailing space characters from str
.
trim(LEADING FROM str) - Removes the leading space characters from str
.
trim(TRAILING FROM str) - Removes the trailing space characters from str
.
trim(trimStr FROM str) - Remove the leading and trailing trimStr
characters from str
.
trim(BOTH trimStr FROM str) - Remove the leading and trailing trimStr
characters from str
.
trim(LEADING trimStr FROM str) - Remove the leading trimStr
characters from str
.
trim(TRAILING trimStr FROM str) - Remove the trailing trimStr
characters from str
.
Arguments:
- str - a string expression
- trimStr - the trim string characters to trim, the default value is a single space
- BOTH, FROM - these are keywords to specify trimming string characters from both ends of the string
- LEADING, FROM - these are keywords to specify trimming string characters from the left end of the string
- TRAILING, FROM - these are keywords to specify trimming string characters from the right end of the string
Examples:
> SELECT trim(' SparkSQL ');
SparkSQL
> SELECT trim(BOTH FROM ' SparkSQL ');
SparkSQL
> SELECT trim(LEADING FROM ' SparkSQL ');
SparkSQL
> SELECT trim(TRAILING FROM ' SparkSQL ');
SparkSQL
> SELECT trim('SL' FROM 'SSparkSQLS');
parkSQ
> SELECT trim(BOTH 'SL' FROM 'SSparkSQLS');
parkSQ
> SELECT trim(LEADING 'SL' FROM 'SSparkSQLS');
parkSQLS
> SELECT trim(TRAILING 'SL' FROM 'SSparkSQLS');
SSparkSQ
base64转字符串
unbase64(str) - Converts the argument from a base 64 string str
to a binary.
Examples:
> SELECT unbase64('U3BhcmsgU1FM');
Spark SQL
hex转字符串
unhex(expr) - Converts hexadecimal expr
to binary.
Examples:
> SELECT decode(unhex('537061726B2053514C'), 'UTF-8');
Spark SQL
uuid
uuid() - Returns an universally unique identifier (UUID) string. The value is returned as a canonical UUID 36-character string.
Examples:
> SELECT uuid();
46707d92-02f4-4817-8116-a4c3b23e6266
日期类
当前时间戳
current_timestamp() - Returns the current timestamp at the start of query evaluation. All calls of current_timestamp within the same query return the same value.
current_timestamp - Returns the current timestamp at the start of query evaluation.
Examples:
> SELECT current_timestamp();
2020-04-25 15:49:11.914
> SELECT current_timestamp;
2020-04-25 15:49:11.914
当前时区
current_timezone() - Returns the current session local timezone.
Examples:
> SELECT current_timezone();
Asia/Shanghai
当前日期
date(expr) - Casts the value expr
to the target data type date
.
日期(加)
date_add(start_date, num_days) - Returns the date that is num_days
after start_date
.
Examples:
> SELECT date_add('2016-07-30', 1);
2016-07-31
日期格式化
date_format(timestamp, fmt) - Converts timestamp
to a value of string in the format specified by the date format fmt
.
Arguments:
- timestamp - A date/timestamp or string to be converted to the given format.
- fmt - Date/time format pattern to follow. See Datetime Patterns for valid date and time format patterns.
Examples:
> SELECT date_format('2016-04-08', 'y');
2016
日期天数(1970-01-01-至今)
date_from_unix_date(days) - Create date from the number of days since 1970-01-01.
Examples:
> SELECT date_from_unix_date(1);
1970-01-02
指定的两个日期的天数
datediff(endDate, startDate) - Returns the number of days from startDate
to endDate
.
Examples:
> SELECT datediff('2009-07-31', '2009-07-30');
1
> SELECT datediff('2009-07-30', '2009-07-31');
-1
当前月的第几天
day(date) - Returns the day of month of the date/timestamp.
Examples:
> SELECT day('2009-07-30');
30
当前周的第几天
dayofweek(date) - Returns the day of the week for date/timestamp (1 = Sunday, 2 = Monday, …, 7 = Saturday).
Examples:
> SELECT dayofweek('2009-07-30');
5
当前年的第几天
dayofyear(date) - Returns the day of year of the date/timestamp.
Examples:
> SELECT dayofyear('2016-04-09');
100
给定时间中获取年份
year(date) - Returns the year component of the date/timestamp.
Examples:
> SELECT year('2016-07-30');
2016
日期提取
extract(field FROM source) - Extracts a part of the date/timestamp or interval source.
Arguments:
- field - selects which part of the source should be extracted
- Supported string values of field for dates and timestamps are(case insensitive):
- “YEAR”, (“Y”, “YEARS”, “YR”, “YRS”) - the year field
- “YEAROFWEEK” - the ISO 8601 week-numbering year that the datetime falls in. For example, 2005-01-02 is part of the 53rd week of year 2004, so the result is 2004
- “QUARTER”, (“QTR”) - the quarter (1 - 4) of the year that the datetime falls in
- “MONTH”, (“MON”, “MONS”, “MONTHS”) - the month field (1 - 12)
- “WEEK”, (“W”, “WEEKS”) - the number of the ISO 8601 week-of-week-based-year. A week is considered to start on a Monday and week 1 is the first week with >3 days. In the ISO week-numbering system, it is possible for early-January dates to be part of the 52nd or 53rd week of the previous year, and for late-December dates to be part of the first week of the next year. For example, 2005-01-02 is part of the 53rd week of year 2004, while 2012-12-31 is part of the first week of 2013
- “DAY”, (“D”, “DAYS”) - the day of the month field (1 - 31)
- “DAYOFWEEK”,(“DOW”) - the day of the week for datetime as Sunday(1) to Saturday(7)
- “DAYOFWEEK_ISO”,(“DOW_ISO”) - ISO 8601 based day of the week for datetime as Monday(1) to Sunday(7)
- “DOY” - the day of the year (1 - 365/366)
- “HOUR”, (“H”, “HOURS”, “HR”, “HRS”) - The hour field (0 - 23)
- “MINUTE”, (“M”, “MIN”, “MINS”, “MINUTES”) - the minutes field (0 - 59)
- “SECOND”, (“S”, “SEC”, “SECONDS”, “SECS”) - the seconds field, including fractional parts
- Supported string values of field for interval(which consists of months,days,microseconds) are(case insensitive):
- “YEAR”, (“Y”, “YEARS”, “YR”, “YRS”) - the total
months
/ 12 - “MONTH”, (“MON”, “MONS”, “MONTHS”) - the total
months
% 12 - “DAY”, (“D”, “DAYS”) - the
days
part of interval - “HOUR”, (“H”, “HOURS”, “HR”, “HRS”) - how many hours the
microseconds
contains - “MINUTE”, (“M”, “MIN”, “MINS”, “MINUTES”) - how many minutes left after taking hours from
microseconds
- “SECOND”, (“S”, “SEC”, “SECONDS”, “SECS”) - how many second with fractions left after taking hours and minutes from
microseconds
- source - a date/timestamp or interval column from where
field
should be extracted
Examples:
> SELECT extract(YEAR FROM TIMESTAMP '2019-08-12 01:00:00.123456');
2019
> SELECT extract(week FROM timestamp'2019-08-12 01:00:00.123456');
33
> SELECT extract(doy FROM DATE'2019-08-12');
224
> SELECT extract(SECONDS FROM timestamp'2019-10-01 00:00:01.000001');
1.000001
> SELECT extract(days FROM interval 1 year 10 months 5 days);
5
> SELECT extract(seconds FROM interval 5 hours 30 seconds 1 milliseconds 1 microseconds);
30.001001
根据给定格式返回unix时间
from_unixtime(unix_time[, fmt]) - Returns unix_time
in the specified fmt
.
Arguments:
- unix_time - UNIX Timestamp to be converted to the provided format.
- fmt - Date/time format pattern to follow. See Datetime Patterns for valid date and time format patterns. The ‘yyyy-MM-dd HH:mm:ss’ pattern is used if omitted.
Examples:
> SELECT from_unixtime(0, 'yyyy-MM-dd HH:mm:ss');
1969-12-31 16:00:00
> SELECT from_unixtime(0);
1969-12-31 16:00:00
根据时间获取小时数
hour(timestamp) - Returns the hour component of the string/timestamp.
Examples:
> SELECT hour('2009-07-30 12:58:59');
12
根据时间获取分钟数
minute(timestamp) - Returns the minute component of the string/timestamp.
Examples:
> SELECT minute('2009-07-30 12:58:59');
58
根据时间获取月份
month(date) - Returns the month component of the date/timestamp.
Examples:
> SELECT month('2016-07-30');
7
获取当前时间
now() - Returns the current timestamp at the start of query evaluation.
Examples:
> SELECT now();
2020-04-25 15:49:11.914
微秒转时间戳
timestamp_micros(microseconds) - Creates timestamp from the number of microseconds since UTC epoch.
Examples:
> SELECT timestamp_micros(1230219000123123);
2008-12-25 07:30:00.123123
毫秒转时间戳
timestamp_seconds(seconds) - Creates timestamp from the number of seconds (can be fractional) since UTC epoch.
Examples:
> SELECT timestamp_seconds(1230219000);
2008-12-25 07:30:00
> SELECT timestamp_seconds(1230219000.123);
2008-12-25 07:30:00.123
秒数转时间戳
timestamp_seconds(seconds) - Creates timestamp from the number of seconds (can be fractional) since UTC epoch.
Examples:
> SELECT timestamp_seconds(1230219000);
2008-12-25 07:30:00
> SELECT timestamp_seconds(1230219000.123);
2008-12-25 07:30:00.123
解析为日期
to_date(date_str[, fmt]) - Parses the date_str
expression with the fmt
expression to a date. Returns null with invalid input. By default, it follows casting rules to a date if the fmt
is omitted.
Arguments:
- date_str - A string to be parsed to date.
- fmt - Date format pattern to follow. See Datetime Patterns for valid date and time format patterns.
Examples:
> SELECT to_date('2009-07-30 04:17:52');
2009-07-30
> SELECT to_date('2016-12-31', 'yyyy-MM-dd');
2016-12-31
解析为时间戳
to_timestamp(timestamp_str[, fmt]) - Parses the timestamp_str
expression with the fmt
expression to a timestamp. Returns null with invalid input. By default, it follows casting rules to a timestamp if the fmt
is omitted.
Arguments:
- timestamp_str - A string to be parsed to timestamp.
- fmt - Timestamp format pattern to follow. See Datetime Patterns for valid date and time format patterns.
Examples:
> SELECT to_timestamp('2016-12-31 00:12:00');
2016-12-31 00:12:00
> SELECT to_timestamp('2016-12-31', 'yyyy-MM-dd');
2016-12-31 00:00:00
解析为时间戳(unix)
to_unix_timestamp(timeExp[, fmt]) - Returns the UNIX timestamp of the given time.
Arguments:
- timeExp - A date/timestamp or string which is returned as a UNIX timestamp.
- fmt - Date/time format pattern to follow. Ignored if
timeExp
is not a string. Default value is “yyyy-MM-dd HH:mm:ss”. See Datetime Patterns for valid date and time format patterns.
Examples:
> SELECT to_unix_timestamp('2016-04-08', 'yyyy-MM-dd');
1460098800
解析为时间戳(utc)
to_utc_timestamp(timestamp, timezone) - Given a timestamp like ‘2017-07-14 02:40:00.0’, interprets it as a time in the given time zone, and renders that time as a timestamp in UTC. For example, ‘GMT+1’ would yield ‘2017-07-14 01:40:00.0’.
Examples:
> SELECT to_utc_timestamp('2016-08-31', 'Asia/Seoul');
2016-08-30 15:00:00
类型转换类
转换为decimal
decimal(expr) - Casts the value expr
to the target data type decimal
.
字符集编码
encode(str, charset) - Encodes the first argument using the second argument character set.
Examples:
> SELECT encode('abc', 'utf-8');
abc
字符集解码
decode(bin, charset) - Decodes the first argument using the second argument character set.
Examples:
> SELECT decode(encode('abc', 'utf-8'), 'utf-8');
abc
弧度转度
degrees(expr) - Converts radians to degrees.
Arguments:
- expr - angle in radians
Examples:
> SELECT degrees(3.141592653589793);
180.0
转double类型
double(expr) - Casts the value expr
to the target data type double
.
转float类型
float(expr) - Casts the value expr
to the target data type float
.
转int类型
int(expr) - Casts the value expr
to the target data type int
.
转String类型
string(expr) - Casts the value expr
to the target data type string
.
转为时间戳类型
timestamp(expr) - Casts the value expr
to the target data type timestamp
.
转为tinyint类型
tinyint(expr) - Casts the value expr
to the target data type tinyint
.
欧拉数
e() - Returns Euler’s number, e.
Examples:
> SELECT e();
2.718281828459045