PEP 615 – Support for the IANA Time Zone Database in the Standard Library
- Author:
- Paul Ganssle <paul at ganssle.io>
- Discussions-To:
- Discourse thread
- Status:
- Final
- Type:
- Standards Track
- Created:
- 22-Feb-2020
- Python-Version:
- 3.9
- Post-History:
- 25-Feb-2020, 29-Mar-2020
- Replaces:
- 431
Table of Contents
- Abstract
- Motivation
- Proposal
- Backwards Compatibility
- Security Implications
- Reference Implementation
- Rejected Ideas
- Footnotes
- References
- Copyright
Abstract
This proposes adding a module, zoneinfo, to provide a concrete time zone
implementation supporting the IANA time zone database. By default,
zoneinfo will use the system’s time zone data if available; if no system
time zone data is available, the library will fall back to using the
first-party package tzdata, deployed on PyPI. [d]
Motivation
The datetime library uses a flexible mechanism to handle time zones: all
conversions and time zone information queries are delegated to an instance of a
subclass of the abstract datetime.tzinfo base class. [10] This allows
users to implement arbitrarily complex time zone rules, but in practice the
majority of users want support for just three types of time zone: [a]
- UTC and fixed offsets thereof
- The system local time zone
- IANA time zones
In Python 3.2, the datetime.timezone class was introduced to support the
first class of time zone (with a special datetime.timezone.utc singleton
for UTC).
While there is still no “local” time zone, in Python 3.0 the semantics of naïve time zones was changed to support many “local time” operations, and it is now possible to get a fixed time zone offset from a local time:
>>> print(datetime(2020, 2, 22, 12, 0).astimezone())
2020-02-22 12:00:00-05:00
>>> print(datetime(2020, 2, 22, 12, 0).astimezone()
... .strftime("%Y-%m-%d %H:%M:%S %Z"))
2020-02-22 12:00:00 EST
>>> print(datetime(2020, 2, 22, 12, 0).astimezone(timezone.utc))
2020-02-22 17:00:00+00:00
However, there is still no support for the time zones described in the IANA time zone database (also called the “tz” database or the Olson database [6]). The time zone database is in the public domain and is widely distributed — it is present by default on many Unix-like operating systems. Great care goes into the stability of the database: there are IETF RFCs both for the maintenance procedures (RFC 6557) and for the compiled binary (TZif) format (RFC 8536). As such, it is likely that adding support for the compiled outputs of the IANA database will add great value to end users even with the relatively long cadence of standard library releases.
Proposal
This PEP has three main concerns:
- The semantics of the
zoneinfo.ZoneInfoclass (zoneinfo-class) - Time zone data sources used (data-sources)
- Options for configuration of the time zone search path (search-path-config)
Because of the complexity of the proposal, rather than having separate “specification” and “rationale” sections the design decisions and rationales are grouped together by subject.
The zoneinfo.ZoneInfo class
Constructors
The initial design of the zoneinfo.ZoneInfo class has several constructors.
ZoneInfo(key: str)
The primary constructor takes a single argument, key, which is a string
indicating the name of a zone file in the system time zone database (e.g.
"America/New_York", "Europe/London"), and returns a ZoneInfo
constructed from the first matching data source on search path (see the
data-sources section for more details). All zone information must be eagerly
read from the data source (usually a TZif file) upon construction, and may
not change during the lifetime of the object (this restriction applies to all
ZoneInfo constructors).
In the event that no matching file is found on the search path (either because
the system does not supply time zone data or because the key is invalid), the
constructor will raise a zoneinfo.ZoneInfoNotFoundError, which will be a
subclass of KeyError.
One somewhat unusual guarantee made by this constructor is that calls with
identical arguments must return identical objects. Specifically, for all
values of key, the following assertion must always be valid [b]:
a = ZoneInfo(key)
b = ZoneInfo(key)
assert a is b
The reason for this comes from the fact that the semantics of datetime
operations (e.g. comparison, arithmetic) depend on whether the datetimes
involved represent the same or different zones; two datetimes are in the same
zone only if dt1.tzinfo is dt2.tzinfo. [1] In addition
to the modest performance benefit from avoiding unnecessary proliferation of
ZoneInfo objects, providing this guarantee should minimize surprising
behavior for end users.
dateutil.tz.gettz has provided a similar guarantee since version 2.7.0
(release March 2018). [16]
Note
The implementation may decide how to implement the cache behavior, but the
guarantee made here only requires that as long as two references exist to
the result of identical constructor calls, they must be references to the
same object. This is consistent with a reference counted cache where
ZoneInfo objects are ejected when no references to them exist (for
example, a cache implemented with a weakref.WeakValueDictionary) — it is
allowed but not required or recommended to implement this with a “strong”
cache, where all ZoneInfo objects are kept alive indefinitely.
ZoneInfo.no_cache(key: str)
This is an alternate constructor that bypasses the constructor’s cache. It is identical to the primary constructor, but returns a new object on each call. This is likely most useful for testing purposes, or to deliberately induce “different zone” semantics between datetimes with the same nominal time zone.
Even if an object constructed by this method would have been a cache miss, it must not be entered into the cache; in other words, the following assertion should always be true:
>>> a = ZoneInfo.no_cache(key)
>>> b = ZoneInfo(key)
>>> a is not b
ZoneInfo.from_file(fobj: IO[bytes], /, key: str = None)
This is an alternate constructor that allows the construction of a ZoneInfo
object from any TZif byte stream. This constructor takes an optional
parameter, key, which sets the name of the zone, for the purposes of
__str__ and __repr__ (see Representations).
Unlike the primary constructor, this always constructs a new object. There are two reasons that this deviates from the primary constructor’s caching behavior: stream objects have mutable state and so determining whether two inputs are identical is difficult or impossible, and it is likely that users constructing from a file specifically want to load from that file and not a cache.
As with ZoneInfo.no_cache, objects constructed by this method must not be
added to the cache.
Behavior during data updates
It is important that a given ZoneInfo object’s behavior not change during
its lifetime, because a datetime’s utcoffset() method is used in both
its equality and hash calculations, and if the result were to change during the
datetime’s lifetime, it could break the invariant for all hashable objects
[3] [4] that if x == y, it must also be true
that hash(x) == hash(y) [c] .
Considering both the preservation of datetime’s invariants and the
primary constructor’s contract to always return the same object when called
with identical arguments, if a source of time zone data is updated during a run
of the interpreter, it must not invalidate any caches or modify any
existing ZoneInfo objects. Newly constructed ZoneInfo objects, however,
should come from the updated data source.
This means that the point at which the data source is updated for new
invocations of the ZoneInfo constructor depends primarily on the semantics
of the caching behavior. The only guaranteed way to get a ZoneInfo object
from an updated data source is to induce a cache miss, either by bypassing the
cache and using ZoneInfo.no_cache or by clearing the cache.
Note
The specified cache behavior does not require that the cache be lazily populated — it is consistent with the specification (though not recommended) to eagerly pre-populate the cache with time zones that have never been constructed.
Deliberate cache invalidation
In addition to ZoneInfo.no_cache, which allows a user to bypass the
cache, ZoneInfo also exposes a clear_cache method to deliberately
invalidate either the entire cache or selective portions of the cache:
ZoneInfo.clear_cache(*, only_keys: Iterable[str]=None) -> None
If no arguments are passed, all caches are invalidated and the first call for
each key to the primary ZoneInfo constructor after the cache has been
cleared will return a new instance.
>>> NYC0 = ZoneInfo("America/New_York")
>>> NYC0 is ZoneInfo("America/New_York")
True
>>> ZoneInfo.clear_cache()
>>> NYC1 = ZoneInfo("America/New_York")
>>> NYC0 is NYC1
False
>>> NYC1 is ZoneInfo("America/New_York")
True
An optional parameter, only_keys, takes an iterable of keys to clear from
the cache, otherwise leaving the cache intact.
>>> NYC0 = ZoneInfo("America/New_York")
>>> LA0 = ZoneInfo("America/Los_Angeles")
>>> ZoneInfo.clear_cache(only_keys=["America/New_York"])
>>> NYC1 = ZoneInfo("America/New_York")
>>> LA0 = ZoneInfo("America/Los_Angeles")
>>> NYC0 is NYC1
False
>>> LA0 is LA1
True
Manipulation of the cache behavior is expected to be a niche use case; this function is primarily provided to facilitate testing, and to allow users with unusual requirements to tune the cache invalidation behavior to their needs.
String representation
The ZoneInfo class’s __str__ representation will be drawn from the
key parameter. This is partially because the key represents a
human-readable “name” of the string, but also because it is a useful parameter
that users will want exposed. It is necessary to provide a mechanism to expose
the key for serialization between languages and because it is also a primary
key for localization projects like CLDR (the Unicode Common Locale Data
Repository [5]).
An example:
>>> zone = ZoneInfo("Pacific/Kwajalein")
>>> str(zone)
'Pacific/Kwajalein'
>>> dt = datetime(2020, 4, 1, 3, 15, tzinfo=zone)
>>> f"{dt.isoformat()} [{dt.tzinfo}]"
'2020-04-01T03:15:00+12:00 [Pacific/Kwajalein]'
When a key is not specified, the str operation should not fail, but
should return the objects’s __repr__:
>>> zone = ZoneInfo.from_file(f)
>>> str(zone)
'ZoneInfo.from_file(<_io.BytesIO object at ...>)'
The __repr__ for a ZoneInfo is implementation-defined and not
necessarily stable between versions, but it must not be a valid ZoneInfo
key, to avoid confusion between a key-derived ZoneInfo with a valid
__str__ and a file-derived ZoneInfo which has fallen through to the
__repr__.
Since the use of str() to access the key provides no easy way to check
for the presence of a key (the only way is to try constructing a ZoneInfo
from it and detect whether it raises an exception), ZoneInfo objects will
also expose a read-only key attribute, which will be None in the event
that no key was supplied.
Pickle serialization
Rather than serializing all transition data, ZoneInfo objects will be
serialized by key, and ZoneInfo objects constructed from raw files (even
those with a value for key specified) cannot be pickled.
The behavior of a ZoneInfo object depends on how it was constructed:
ZoneInfo(key): When constructed with the primary constructor, aZoneInfoobject will be serialized by key, and when deserialized the will use the primary constructor in the deserializing process, and thus be expected to be the same object as other references to the same time zone. For example, ifeurope_berlin_pklis a string containing a pickle constructed fromZoneInfo("Europe/Berlin"), one would expect the following behavior:>>> a = ZoneInfo("Europe/Berlin") >>> b = pickle.loads(europe_berlin_pkl) >>> a is b True
ZoneInfo.no_cache(key): When constructed from the cache-bypassing constructor, theZoneInfoobject will still be serialized by key, but when deserialized, it will use the cache bypassing constructor. Ifeurope_berlin_pkl_ncis a string containing a pickle constructed fromZoneInfo.no_cache("Europe/Berlin"), one would expect the following behavior:>>> a = ZoneInfo("Europe/Berlin") >>> b = pickle.loads(europe_berlin_pkl_nc) >>> a is b False
ZoneInfo.from_file(fobj, /, key=None): When constructed from a file, theZoneInfoobject will raise an exception on pickling. If an end user wants to pickle aZoneInfoconstructed from a file, it is recommended that they use a wrapper type or a custom serialization function: either serializing by key or storing the contents of the file object and serializing that.
This method of serialization requires that the time zone data for the required
key be available on both the serializing and deserializing side, similar to the
way that references to classes and functions are expected to exist in both the
serializing and deserializing environments. It also means that no guarantees
are made about the consistency of results when unpickling a ZoneInfo
pickled in an environment with a different version of the time zone data.
Sources for time zone data
One of the hardest challenges for IANA time zone support is keeping the data up
to date; between 1997 and 2020, there have been between 3 and 21 releases per
year, often in response to changes in time zone rules with little to no notice
(see [7] for more details). In order to keep up to date,
and to give the system administrator control over the data source, we propose
to use system-deployed time zone data wherever possible. However, not all
systems ship a publicly accessible time zone database — notably Windows uses a
different system for managing time zones — and so if available zoneinfo
falls back to an installable first-party package, tzdata, available on
PyPI. [d] If no system zoneinfo files are found but tzdata is installed, the
primary ZoneInfo constructor will use tzdata as the time zone source.
System time zone information
Many Unix-like systems deploy time zone data by default, or provide a canonical
time zone data package (often called tzdata, as it is on Arch Linux, Fedora,
and Debian). Whenever possible, it would be preferable to defer to the system
time zone information, because this allows time zone information for all
language stacks to be updated and maintained in one place. Python distributors
are encouraged to ensure that time zone data is installed alongside Python
whenever possible (e.g. by declaring tzdata as a dependency for the
python package).
The zoneinfo module will use a “search path” strategy analogous to the
PATH environment variable or the sys.path variable in Python; the
zoneinfo.TZPATH variable will be read-only (see search-path-config for
more details), ordered list of time zone data locations to search. When
creating a ZoneInfo instance from a key, the zone file will be constructed
from the first data source on the path in which the key exists, so for example,
if TZPATH were:
TZPATH = (
"/usr/share/zoneinfo",
"/etc/zoneinfo"
)
and (although this would be very unusual) /usr/share/zoneinfo contained
only America/New_York and /etc/zoneinfo contained both
America/New_York and Europe/Moscow, then
ZoneInfo("America/New_York") would be satisfied by
/usr/share/zoneinfo/America/New_York, while ZoneInfo("Europe/Moscow")
would be satisfied by /etc/zoneinfo/Europe/Moscow.
At the moment, on Windows systems, the search path will default to empty, because Windows does not officially ship a copy of the time zone database. On non-Windows systems, the search path will default to a list of the most commonly observed search paths. Although this is subject to change in future versions, at launch the default search path will be:
TZPATH = (
"/usr/share/zoneinfo",
"/usr/lib/zoneinfo",
"/usr/share/lib/zoneinfo",
"/etc/zoneinfo",
)
This may be configured both at compile time or at runtime; more information on configuration options at search-path-config.
The tzdata Python package
In order to ensure easy access to time zone data for all end users, this PEP
proposes to create a data-only package tzdata as a fallback for when system
data is not available. The tzdata package would be distributed on PyPI as
a “first party” package [d], maintained by the CPython development team.
The tzdata package contains only data and metadata, with no public-facing
functions or classes. It will be designed to be compatible with both newer
importlib.resources [11] access patterns and older
access patterns like pkgutil.get_data [12] .
While it is designed explicitly for the use of CPython, the tzdata package
is intended as a public package in its own right, and it may be used as an
“official” source of time zone data for third party Python packages.
Search path configuration
The time zone search path is very system-dependent, and sometimes even application-dependent, and as such it makes sense to provide options to customize it. This PEP provides for three such avenues for customization:
- Global configuration via a compile-time option
- Per-run configuration via environment variables
- Runtime configuration change via a
reset_tzpathfunction
In all methods of configuration, the search path must consist of only absolute,
rather than relative paths. Implementations may choose to ignore, warn or raise
an exception if a string other than an absolute path is found (and may make
different choices depending on the context — e.g. raising an exception when an
invalid path is passed to reset_tzpath but warning when one is included in
the environment variable). If an exception is not raised, any strings other
than an absolute path must not be included in the time zone search path.
Compile-time options
It is most likely that downstream distributors will know exactly where their
system time zone data is deployed, and so a compile-time option
PYTHONTZPATH will be provided to set the default search path.
The PYTHONTZPATH option should be a string delimited by os.pathsep,
listing possible locations for the time zone data to be deployed (e.g.
/usr/share/zoneinfo).
Environment variables
When initializing TZPATH (and whenever reset_tzpath is called with no
arguments), the zoneinfo module will use the environment variable
PYTHONTZPATH, if it exists, to set the search path.
PYTHONTZPATH is an os.pathsep-delimited string which replaces (rather
than augments) the default time zone path. Some examples of the proposed
semantics:
$ python print_tzpath.py
("/usr/share/zoneinfo",
"/usr/lib/zoneinfo",
"/usr/share/lib/zoneinfo",
"/etc/zoneinfo")
$ PYTHONTZPATH="/etc/zoneinfo:/usr/share/zoneinfo" python print_tzpath.py
("/etc/zoneinfo",
"/usr/share/zoneinfo")
$ PYTHONTZPATH="" python print_tzpath.py
()
This provides no built-in mechanism for prepending or appending to the default search path, as these use cases are likely to be somewhat more niche. It should be possible to populate an environment variable with the default search path fairly easily:
$ export DEFAULT_TZPATH=$(python -c \
"import os, zoneinfo; print(os.pathsep.join(zoneinfo.TZPATH))")