如何在数据框中正确设置熊猫日期时间对象的日期时间索引?

我有一个熊猫数据框:

    lat         lng         alt days              date        time
0   40.003834   116.321462  211 39745.175405      2008-10-24  04:12:35
1   40.003783   116.321431  201 39745.175463  2008-10-24      04:12:40
2   40.003690   116.321429  203 39745.175521      2008-10-24      04:12:45
3   40.003589   116.321427  194 39745.175579      2008-10-24      04:12:50
4   40.003522   116.321412  190 39745.175637      2008-10-24      04:12:55
5   40.003509   116.321484  188 39745.175694      2008-10-24      04:13:00

我尝试将 df [‘ date’]和 df [‘ time’]列转换为 datetime:

df['Datetime'] = pd.to_datetime(df['date']+df['time'])
df = df.set_index(['Datetime'])
del df['date']
del df['time']

然后我得到了:

                    lat         lng         alt days
Datetime
2008-10-2404:12:35  40.003834   116.321462  211 39745.175405
2008-10-2404:12:40  40.003783   116.321431  201 39745.175463
2008-10-2404:12:45  40.003690   116.321429  203 39745.175521
2008-10-2404:12:50  40.003589   116.321427  194 39745.175579
2008-10-2404:12:55  40.003522   116.321412  190 39745.175637

但如果我尝试:

df.between_time(time(1),time(22,59,59))['lng'].std()

我得到一个错误-‘ TypeError: Index must be DatetimeIndex’

因此,我还尝试设置了 DatetimeIndex:

df['Datetime'] = pd.to_datetime(df['date']+df['time'])
#df = df.set_index(['Datetime'])
df = df.set_index(pd.DatetimeIndex(df['Datetime']))
del df['date']
del df['time']

这也会抛出一个错误-‘ DateParseError: known string format’

如何正确创建 datetime 列和 DatetimeIndex,以便 df.between _ time ()正常工作?

242979 次浏览

You are not creating datetime index properly,

format = '%Y-%m-%d %H:%M:%S'
df['Datetime'] = pd.to_datetime(df['date'] + ' ' + df['time'], format=format)
df = df.set_index(pd.DatetimeIndex(df['Datetime']))

To simplify Kirubaharan's answer a bit:

df['Datetime'] = pd.to_datetime(df['date'] + ' ' + df['time'])
df = df.set_index('Datetime')

And to get rid of unwanted columns (as OP did but did not specify per se in the question):

df = df.drop(['date','time'], axis=1)

This worked best for me:

format = '%Y-%m-%d%H:%M:%S'
df['Datetime'] = pd.to_datetime(df['date'] + df['time'].astype("string"), format=format)

In some cases Python treats df['date'] as column of integers.