首先说一下结论,hive字符串直接转timestamp可以这么写,这个结果返回的就是一个timestamp

select timestamp '2020-03-19T17:22:28.345';
+--------------------------+
| _c0 |
+--------------------------+
| 2020-03-19 17:22:28.345 |
+--------------------------+

下面从几个SQL来介绍一下hive中的时间转化问题,为了便于理解,所有SQL的时间都为

2020-03-19T17:22:28.345

涉及到的函数及概念


  • 时间戳
    也就是timestamp,是hive中的一种数据类型,与unix_timestamp不是一个概念
  • unix_timestamp
    可以将字符串转成毫秒(即1970-1-1至今的秒)
  • from_unixtime
    可以将unix_timestamp转成时间戳
  • to_utc_timestamp
    可以将13位的unix_timestamp(精确到毫秒)转成时间戳
  • current_timestamp()
    获取当前时间戳

1.hive unix_timestamp函数可以将日期字符串转成unix_timestamp(),但是只能精确到10位(秒)

select unix_timestamp("2020-03-19T17:22:28.345", "yyyy-MM-dd'T'HH:mm:ss.SSS");
+-------------+
| _c0 |
+-------------+
| 1584660148 |
+-------------+

可以看到,虽然我们传进来的日期字符串带有毫秒,这里只得到了10位的unix_timestamp

2.from_unixtime函数可以将unix_timestamp转成时间戳,精度也是到秒

select from_unixtime(1584660148);

select from_unixtime(unix_timestamp("2020-03-19T17:22:28.345", "yyyy-MM-dd'T'HH:mm:ss.SSS"));
+----------------------+
| _c0 |
+----------------------+
| 2020-03-19 17:22:28 |
+----------------------+

这里我拿一个精度为13位的unix_timestamp,这个13位的结果是我通过java程序得到的

select from_unixtime(1584638548345);
+-----------------------+
| _c0 |
+-----------------------+
| 52185-03-12 15:32:25 |
+-----------------------+

结果严重出错,说明from_unixtime不支持精确到毫秒

3.这会我们换一个函数,to_utc_timestamp函数可以解析带毫秒的unix_timestamp(13位的)

select to_utc_timestamp(1584638548345, 'UTC');
+--------------------------+
| _c0 |
+--------------------------+
| 2020-03-19 17:22:28.345 |
+--------------------------+

如果像上面那样嵌套着直接通过一个字符串转成时间戳,需要乘1000,缺点是也拿不到毫秒

官方文档是这么写的:

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF

Converts a timestamp* in UTC to a given timezone (as of Hive 0.8.0).

* timestamp is a primitive type, including timestamp/date, tinyint/smallint/int/bigint, float/double and decimal.

Fractional values are considered as seconds. Integer values are considered as milliseconds. For example, from_utc_timestamp(2592000.0,'PST'), from_utc_timestamp(2592000000,'PST') and from_utc_timestamp(timestamp '1970-01-30 16:00:00','PST') all return the timestamp 1970-01-30 08:00:00.

意思是要做一下单位转换,由于下面这个SQL的结果与系统时区有关,先看一下自己的时区

timedatectl
Local time: Fri 2021-03-19 05:32:41 MDT
Universal time: Fri 2021-03-19 11:32:41 UTC
RTC time: Fri 2021-03-19 11:32:41
Time zone: America/Denver (MDT, -0600)
NTP enabled: yes
NTP synchronized: yes
RTC in local TZ: no
DST active: yes
Last DST change: DST began at
Sun 2021-03-14 01:59:59 MST
Sun 2021-03-14 03:00:00 MDT
Next DST change: DST ends (the clock jumps one hour backwards) at
Sun 2021-11-07 01:59:59 MDT
Sun 2021-11-07 01:00:00 MST

可以看到我的时间是MDT,即GMT-6,要还原的话就+6

select to_utc_timestamp(unix_timestamp('2020-03-19T17:22:28.345', "yyyy-MM-dd'T'hh:mm:ss.SSS" )*1000, 'GMT+6');
+------------------------+
| _c0 |
+------------------------+
| 2020-03-19 17:22:28.0 |
+------------------------+

4.如果字段是时间戳类型的,hive是可以直接拿到毫秒的

select cast(current_timestamp() as double)*1000;
+--------------------+
| _c0 |
+--------------------+
| 1.616132202819E12 |
+--------------------+
select cast(timestamp '2020-03-19T17:22:28.345' as double)*1000;
+--------------------+
| _c0 |
+--------------------+
| 1.584638548345E12 |
+--------------------+

总结:如果要将字符串转时间戳,只需要精度为秒的话,直接用unix_timestamp函数是没有问题的;如果需要精确到毫秒的话,直接在字符串前添加timestamp关键字即可