如何从熊猫数据框中的时间戳列中删除时区

我读了 熊猫改变外汇数据框架的时区,但是我想让我的数据帧时区的时间列对于 sqlite3数据库的互操作性来说比较天真。

我的熊猫数据框中的数据已经转换为 UTC 数据,但是我不想在数据库中维护这个 UTC 时区信息。

从其他来源获得的数据样本如下:

print(type(testdata))
print(testdata)
print(testdata.applymap(type))

提供:

<class 'pandas.core.frame.DataFrame'>
time  navd88_ft  station_id  new
0  2018-03-07 01:31:02+00:00  -0.030332          13    5
1  2018-03-07 01:21:02+00:00  -0.121653          13    5
2  2018-03-07 01:26:02+00:00  -0.072945          13    5
3  2018-03-07 01:16:02+00:00  -0.139917          13    5
4  2018-03-07 01:11:02+00:00  -0.152085          13    5
time        navd88_ft     station_id  \
0  <class 'pandas._libs.tslib.Timestamp'>  <class 'float'>  <class 'int'>
1  <class 'pandas._libs.tslib.Timestamp'>  <class 'float'>  <class 'int'>
2  <class 'pandas._libs.tslib.Timestamp'>  <class 'float'>  <class 'int'>
3  <class 'pandas._libs.tslib.Timestamp'>  <class 'float'>  <class 'int'>
4  <class 'pandas._libs.tslib.Timestamp'>  <class 'float'>  <class 'int'>


new
0  <class 'int'>
1  <class 'int'>
2  <class 'int'>
3  <class 'int'>
4  <class 'int'>

但是

newstamp = testdata['time'].tz_convert(None)

给出了一个最终的错误:

TypeError: index is not a valid DatetimeIndex or PeriodIndex

如何用时区初始时间戳替换列?

110084 次浏览

The column must be a datetime dtype, for example after using pd.to_datetime. Then, you can use tz_localize to change the time zone, a naive timestamp corresponds to time zone None:

testdata['time'].dt.tz_localize(None)

Unless the column is an index (DatetimeIndex), the .dt accessor must be used to access pandas datetime functions.

When your data contains datetimes spanning different timezones or prior and after application of daylight saving time e.g. obtained from postges database with psycopg2, depending on pandas version you might end up in some of the scenarios where best method of conversion is:

testdata['time'].apply(lambda x: x.replace(tzinfo=None))

Scenarios when this works (note the usage of FixedOffsetTimezone with different offset) while usage of .dt.tz_localize(None) does not:

df = pd.DataFrame([
datetime.datetime(2018, 5, 17, 21, 40, 20, 775854,
tzinfo=psycopg2.tz.FixedOffsetTimezone(offset=120, name=None)),
datetime.datetime(2021, 3, 17, 14, 36, 13, 902741,
tzinfo=psycopg2.tz.FixedOffsetTimezone(offset=60, name=None))
])


pd.__version__
'0.24.2'




df[0].dt.tz_localize(None)


Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/pandas/core/arrays/datetimes.py", line 1861, in objects_to_datetime64ns
values, tz_parsed = conversion.datetime_to_datetime64(data)
File "pandas/_libs/tslibs/conversion.pyx", line 185, in pandas._libs.tslibs.conversion.datetime_to_datetime64
ValueError: Array must be all same time zone
pd.__version__
'1.1.2'




df[0].dt.tz_localize(None)


Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.8/site-packages/pandas/core/generic.py", line 5132, in __getattr__
return object.__getattribute__(self, name)
File "/usr/local/lib/python3.8/site-packages/pandas/core/accessor.py", line 187, in __get__
accessor_obj = self._accessor(obj)
File "/usr/local/lib/python3.8/site-packages/pandas/core/indexes/accessors.py", line 480, in __new__
raise AttributeError("Can only use .dt accessor with datetimelike values")
AttributeError: Can only use .dt accessor with datetimelike values


I know that you mentioned that your timestamps are already in UTC, but just to be defensive, you might as well make your code impervious to the case where timestamps (some or all of them) were in a different timezone. This doesn't cost anything, and will be more robust:

newcol = testdata['time'].dt.tz_convert(None)

As per the docs:

A tz of None will convert to UTC and remove the timezone information.

This is safer than just dropping any timezone the timestamps may contain.

Here is a function that will

  • find all columns with any instance of pd.Timestamp in them
  • convert those columns to dtype datetime (to be able to use the .dt accessor on the Series')
  • Localize all timestamps with dt.tz_localize(None), which will keep the timeshift relative to UTC
def remove_tz_from_dataframe(df_in):
df = df_in.copy()
col_times = [ col for col in df.columns if any([isinstance(x, pd.Timestamp) for x in df[col]])]
for col in col_times:
df[col] = pd.to_datetime(
df[col], infer_datetime_format=True)
df[col] = df[col].dt.tz_localize(None)
return df