当数据依赖于日期时间时,在数据库中保存日期时间和时区信息的最佳实践

关于在 DB 中保存日期时间和时区信息有很多问题,但更多的是在整体层面上。在这里,我想谈谈一个具体的案例。

系统规格

  • 我们有订单系统数据库
  • 它是一个多租户系统,租户可以使用任意时区(它是任意的,但是每个租户只有一个时区,保存在 Tenants 表中一次,永远不会更改)

数据库中需要包含业务规则

  • 当租户将一个 Order 放入系统时,订单数是根据它们的本地日期时间计算的(它不是字面上的数字,而是某种类型的标识符,如 ORDR-13432-Year-Month-Day)。精确的计算目前并不重要,重要的是它取决于租户的本地日期时间
  • We also do want to be able to select all Orders, on the system level, placed between some UTC datetimes regardless of the tenant (for general system statistics/reporting)

我们最初的想法

  • 我们最初的想法是在整个数据库中保存 UTC 日期时间,当然,保持租户时区相对于 UTC 的偏移,并让使用 DB 的应用程序始终将日期时间转换为 UTC,以便 DB 本身始终使用 UTC 操作。

进场1

  • 为每个租户保存本地租户的日期时间会很好,但是我们会遇到以下查询问题:

    SELECT * FROM ORDERS WHERE OrderDateTime BETWEEN UTCDateTime1 AND UTCDateTime2
    

It's problematic because OrderDateTime in this query means different moment in time, based on tenant. Of course, this query might include join to Tenants table to get local datetime offset which would then calculate OrderDateTime on the fly to make adjustments. It's possible, but not sure if it's a good way to do it?

Approach 2

  • On the other hand, when saving UTC datetime, then when we do calculation of OrderNumber since the day/month/year in UTC might differ from the one in local datetime

Let's take extreme example; let's say tenant is 6 hours ahead of UTC and his local datetime is 2017-01-01 02:00. UTC would be 2016-12-31 20:00. Order placed at that moment should get OrderNumber 'ORDR-13432-2017-1-1' but if saving UTC it would get ORDR-13432-2016-12-31.

In this case, at the moment of creating Order in DB, we should get UTC datetime, tenants offset and compile OrderNumber based on recalculated tenants localtime but still save DateTime column in UTC.

Questions

  1. What is the preferred way of handling this kind of situation?
  2. Is there a nice solution with saving UTC datetimes because that one would be pretty nice for us because of the system-level reporting?
  3. If going with saving UTC, is Approach 2) good way to handle those cases or is there some better/recommended way?

[UPDATE]

Based on comments from Gerard Ashton and Hugo:

Initial question was not clear with respect to the detail if tenant can change timezone or not and what happens if political authority changes the timezone properties or some terirory's timezone. Of course that is of an extreme importance, but it is not in the center of this question . We might address that in a separate question.

For the sake of this question, lets assume tenant will not change location. Timezone properties or timezone itself for that location might change and those changes will be handled in the system separately from this question.

79393 次浏览

I'd recommend to always use UTC internally, and convert to a timezone only when displaying the date to the user. So I tend to prefer approach 2.

If there's a business rule saying that the tenant's local date/time must be part of the identifier, so be it. But internally, you keep the order date in UTC.

Using your example: a tenant whose timezone is in UTC+06:00, so the tenant's local time is 2017-01-01 02:00, which is equivalent to 2016-12-31 20:00 in UTC.

The order identifier would be ORDR-13432-2017-1-1 and the order date would be UTC 2016-12-31 20:00Z.

To get all orders between 2 dates, this query is straighforward:

SELECT * FROM ORDERS WHERE OrderDateTime BETWEEN UTCDateTime1 AND UTCDateTime2

Because OrderDateTime is in UTC.

If looking for a specific tenant, then you can get the corresponding timezone, convert the date accordingly and search for it. Using the same example above (tenant's timezone is in UTC+06:00), to get all orders made in 2017-01-01 (in tenant's local time):

--get tenant timezone
--startUTC=tenant's local 2017-01-01 00:00 converted to UTC (2016-12-31T18:00Z)
--endUTC=tenant's local 2017-01-01 23:59:59.999 converted to UTC (2017-01-01T17:59:59.999)
SELECT * FROM ORDERS WHERE OrderDateTime between startUTC and endUTC

This will get ORDR-13432-2017-1-1 correctly.


To make queries for multiple tenants in different timezones, both approaches require a join, so none are "better" for this case.

Unless you create an extra column with the tenant's local date/time (the UTC OrderDateTime converted to tenant's timezone). It'll be redundant, but it can help you with queries that searches in more than one timezone. If that's a reasonable trade-off it will depend on how frequent those queries will be made.

Hugo's answer is mostly correct, but I'll add a few key points:

  • When you're storing the customer's time zone, do NOT store a numerical offset. As others have pointed out, the offset from UTC is only for a single point in time, and can easily change for DST and for other reasons. Instead, you should store a time zone identifier, preferably an IANA time zone identifier as a string, such as "America/Los_Angeles". Read more in the timezone tag wiki.

  • Your OrderDateTime field should absolutely represent the time in UTC. However, depending on your database platform, you have several choices for how to store this.

    • For example, if using Microsoft SQL Server, a good approach is to store the local time in a datetimeoffset column, which preserves the offset from UTC. Note that any index you create on that column will be based on the UTC equivalent, so you will get good query performance when doing your range query.

    • If using other database platforms, you may instead wish to store the UTC value in a timestamp field. Some databases also have timestamp with time zone, but understand that it doesn't mean it stores the time zone or offset, it just means that it can do conversions for you implicitly as you store and retrieve values. If you intend to always represent UTC, then often timestamp (without time zone) or just datetime is more appropriate.

  • Since either of the above methods will store a UTC time, you'll also need to consider how to perform operations that need an index of local time values. For example, you might need to create a daily report, based on the day of the user's time zone. For that, you'd need to group by the local date. If you try to compute that at query time from your UTC value, you'll end up scanning the entire table.

    A good approach to deal with this is to create a separate column for the local date (or perhaps even the local datetime depending on your needs, but not a datetimeoffset or timestamp). This could be a completely isolated column that you populate separately, or it could be a computed/calculated column based on your other column. Use this column in an index so you can filter or group by local date.

  • If you go for the computed-column approach, you'll need to know how to convert between time zones in the database. Some databases have a convert_tz function built-in that understands IANA time zone identifiers.

    If you're using Microsoft SQL Server, you can use the new AT TIME ZONE function in SQL 2016 and Azure SQL DB, but that only works with Microsoft time zone identifiers. To use IANA time zone identifiers, you'll need a third party solution, such as my SQL Server Time Zone Support project.

  • At query time, avoid using the BETWEEN statement. It is fully inclusive. It works ok for whole dates, but when you have time involved you're better off doing a half-open range query, such as:

    ... WHERE OrderDateTime >= @t1 AND OrderDateTime < @t2
    

    For example, if @t1 were the start of today, @t2 would be the start of tomorrow.

Regarding the scenario discussed in comments where the user's time zone has changed:

  • If you choose to calculate the local date in the database, the only scenario you need to worry about is if a location or business switches time zones without a "zone split" occurring. A zone split is when a new time zone identifier is introduced which covers the area that changed, including their old and new rules.

    For example, the latest zone added to the IANA tzdb at the time of writing this is America/Punta_Arenas, which was a zone split when the southern part of Chile decided to stay at UTC-3 when the rest of Chile (America/Santiago) went back to UTC-4 at the end of DST.

    However, if a minor locality on the border of two time zones decides to change which side they follow, and a zone split wasn't warranted, then you'd potentially be using the rules of their new time zone against their old data.

  • If you store the local date separately (computed in the application, not the DB), then you'll have no problems. The user changes their time zone to the new one, all old data is still intact, and new data is stored with the new time zone.