时区的定义

我们使用经纬度[1]来标识地球上的任意一个点。

理论时区

不像纬度有赤道作为自然的起点,经度没有自然的起点而使用经过伦敦格林尼治天文台旧址的子午线作为起点。

理论时区的宽度是15°,所以一共有 360 / 15 = 24 个时区,一天有 24 小时,所以每个时区正好对应一个小时。自子午线向东,这些时区的名称为:中时区(以子午线为中心的时区)、东一区、东二区...东十二区、西十一区、西十区...西一区[2]。

由于地球的自转方向为自西向东,所以越东的时区时间越早。

实际时区

为了避开国界线,有的时区的形状并不规则,而且比较大的国家以国家内部行政分界线为时区界线,这是实际时区,即法定时区。[2]

同一国家可以有不同的时区,同一国家也可以是同一个时区。

  • 比如美国的夏威夷州是 UTC-10,而加利福尼亚州是 UTC-8
  • 整个中国的理论时区横跨了从东五区(UTC+5)到东九区(UTC+9)共计五个时区,但国家只有一个时区:北京时间

时区会变化

Why is subtracting these two times (in 1927) giving a strange result?

时区的 offset 不正确[4]

有人推测是时区数据库错了而不是 pytz 的问题。但我在其他编程语言的时区库中没有搜索到相关问题。

import datetime
import pytz
shanghai_tz = pytz.timezone('Asia/Shanghai')
# 在初始化中传入的时区的 offset 是不准确的
>>> datetime.datetime(2018, 1, 1, tzinfo=shanghai_tz)
datetime.datetime(2018, 1, 1, 0, 0, tzinfo=<DstTzInfo 'Asia/Shanghai' LMT+8:06:00 STD>)

# 要使用 pytz 文档中的 localize 才准确
>>> shanghai_tz.localize(datetime.datetime(2018, 1, 1))
datetime.datetime(2018, 1, 1, 0, 0, tzinfo=<DstTzInfo 'Asia/Shanghai' CST+8:00:00 STD>)

pytz.tzinfo.localize 的源码复杂:

def localize(self, dt, is_dst=False):
    '''Convert naive time to local time.

    This method should be used to construct localtimes, rather
    than passing a tzinfo argument to a datetime constructor.

    is_dst is used to determine the correct timezone in the ambigous
    period at the end of daylight saving time.

    >>> from pytz import timezone
    >>> fmt = '%Y-%m-%d %H:%M:%S %Z (%z)'
    >>> amdam = timezone('Europe/Amsterdam')
    >>> dt  = datetime(2004, 10, 31, 2, 0, 0)
    >>> loc_dt1 = amdam.localize(dt, is_dst=True)
    >>> loc_dt2 = amdam.localize(dt, is_dst=False)
    >>> loc_dt1.strftime(fmt)
    '2004-10-31 02:00:00 CEST (+0200)'
    >>> loc_dt2.strftime(fmt)
    '2004-10-31 02:00:00 CET (+0100)'
    >>> str(loc_dt2 - loc_dt1)
    '1:00:00'

    Use is_dst=None to raise an AmbiguousTimeError for ambiguous
    times at the end of daylight saving time

    >>> try:
    ...     loc_dt1 = amdam.localize(dt, is_dst=None)
    ... except AmbiguousTimeError:
    ...     print('Ambiguous')
    Ambiguous

    is_dst defaults to False

    >>> amdam.localize(dt) == amdam.localize(dt, False)
    True

    is_dst is also used to determine the correct timezone in the
    wallclock times jumped over at the start of daylight saving time.

    >>> pacific = timezone('US/Pacific')
    >>> dt = datetime(2008, 3, 9, 2, 0, 0)
    >>> ploc_dt1 = pacific.localize(dt, is_dst=True)
    >>> ploc_dt2 = pacific.localize(dt, is_dst=False)
    >>> ploc_dt1.strftime(fmt)
    '2008-03-09 02:00:00 PDT (-0700)'
    >>> ploc_dt2.strftime(fmt)
    '2008-03-09 02:00:00 PST (-0800)'
    >>> str(ploc_dt2 - ploc_dt1)
    '1:00:00'

    Use is_dst=None to raise a NonExistentTimeError for these skipped
    times.

    >>> try:
    ...     loc_dt1 = pacific.localize(dt, is_dst=None)
    ... except NonExistentTimeError:
    ...     print('Non-existent')
    Non-existent
    '''
    if dt.tzinfo is not None:
        raise ValueError('Not naive datetime (tzinfo is already set)')

    # Find the two best possibilities.
    possible_loc_dt = set()
    for delta in [timedelta(days=-1), timedelta(days=1)]:
        loc_dt = dt + delta
        idx = max(0, bisect_right(
            self._utc_transition_times, loc_dt) - 1)
        inf = self._transition_info[idx]
        tzinfo = self._tzinfos[inf]
        loc_dt = tzinfo.normalize(dt.replace(tzinfo=tzinfo))
        if loc_dt.replace(tzinfo=None) == dt:
            possible_loc_dt.add(loc_dt)

    if len(possible_loc_dt) == 1:
        return possible_loc_dt.pop()

    # If there are no possibly correct timezones, we are attempting
    # to convert a time that never happened - the time period jumped
    # during the start-of-DST transition period.
    if len(possible_loc_dt) == 0:
        # If we refuse to guess, raise an exception.
        if is_dst is None:
            raise NonExistentTimeError(dt)

        # If we are forcing the pre-DST side of the DST transition, we
        # obtain the correct timezone by winding the clock forward a few
        # hours.
        elif is_dst:
            return self.localize(
                dt + timedelta(hours=6), is_dst=True) - timedelta(hours=6)

        # If we are forcing the post-DST side of the DST transition, we
        # obtain the correct timezone by winding the clock back.
        else:
            return self.localize(
                dt - timedelta(hours=6),
                is_dst=False) + timedelta(hours=6)

    # If we get this far, we have multiple possible timezones - this
    # is an ambiguous case occuring during the end-of-DST transition.

    # If told to be strict, raise an exception since we have an
    # ambiguous case
    if is_dst is None:
        raise AmbiguousTimeError(dt)

    # Filter out the possiblilities that don't match the requested
    # is_dst
    filtered_possible_loc_dt = [
        p for p in possible_loc_dt if bool(p.tzinfo._dst) == is_dst
    ]

    # Hopefully we only have one possibility left. Return it.
    if len(filtered_possible_loc_dt) == 1:
        return filtered_possible_loc_dt[0]

    if len(filtered_possible_loc_dt) == 0:
        filtered_possible_loc_dt = list(possible_loc_dt)

    # If we get this far, we have in a wierd timezone transition
    # where the clocks have been wound back but is_dst is the same
    # in both (eg. Europe/Warsaw 1915 when they switched to CET).
    # At this point, we just have to guess unless we allow more
    # hints to be passed in (such as the UTC offset or abbreviation),
    # but that is just getting silly.
    #
    # Choose the earliest (by UTC) applicable timezone if is_dst=True
    # Choose the latest (by UTC) applicable timezone if is_dst=False
    # i.e., behave like end-of-DST transition
    dates = {}  # utc -> local
    for local_dt in filtered_possible_loc_dt:
        utc_time = (
            local_dt.replace(tzinfo=None) - local_dt.tzinfo._utcoffset)
        assert utc_time not in dates
        dates[utc_time] = local_dt
    return dates[[min, max][not is_dst](dates)]

总结一下,对于 pytz,获取带时区的时间要使用 tz.localize(),将一个转换为另一个时区要 dt_with_tz.astimezone(another_tz)[5]。

程序设计

由于时区的最小单位是小时:

  • 所以如果要区分时区,那么储存的时间必须包含小时,比如你不能只储存到天2018-01-01
  • 所以储存的时间也要包含时区,比如 MongoDB 储存的时区为 UTC

The official BSON specification refers to the BSON Date type as the UTC datetime.[3]

程序中的时区不应该与机器所在的时区挂钩,否则,假如从中国机房迁移到美国机房,那么你的程序就会出问题。

只需要一个时区

比如对于大部分中国的程序来说,只需要考虑北京时间这一个时区。这里称这个时区为当地时区。

我认为在程序中(前端、后端)可以只使用当地时区。好处有:

  • 增强可读性,减少混乱。比如调试时看北京时间肯定比 UTC 时间更直观
  • 避免不必要的时区转换

如果数据库的时区可以修改,那么也修改为当地时区,否则,使用数据库的时区。

比如 MongoDB 使用 UTC 时区储存,不可更改(我没有搜索到更改的配置),那么如果有按月分表,那么也使用 UTC 划分月,这样数据库的时区就统一为了 UTC;如果使用当地时区分月,那么就会造成分歧。

需要多个时区

在程序内部使用 UTC 时区,展示数据时使用用户选择的时区。