查找熊猫数据框架系列的月底

我在 DataFrame 中有一个序列,最初作为对象读入,然后需要将其转换为 呀呀呀形式的日期,其中 D是月底。

例如,我有一个 DataFrame df,其中有一列 Date作为对象:

...      Date    ...
...     200104   ...
...     200508   ...

我想要的是一个日期对象:

...      Date    ...
...  2001-04-30  ...
...  2005-08-31  ...

使得 df['Date'].item()返回

datetime.date(2001, 04, 30)

我使用了下面的代码来实现这个目标,但是我所有的约会都是在月初,而不是月底。请指示。

df['Date'] = pd.to_datetime(df['Date'], format="%Y%m").dt.date

注意: 我已经导入了 Pandas (pd)和 datetime (dt)

108118 次浏览

You can use pandas.tseries.offsets.MonthEnd:

from pandas.tseries.offsets import MonthEnd


df['Date'] = pd.to_datetime(df['Date'], format="%Y%m") + MonthEnd(0)

The 0 in MonthEnd just specifies to roll forward to the end of the given month. Note that if we'd used MonthEnd(1), then we'd have got the next date which is at the end of the month. If you wanted the last day of the next month, you'd then add an extra MonthEnd(1), etc. This should work for any month, so you don't need to know the number days in the month, or anything like that. More offset information can be found in the documentation.

Example usage and output:

df = pd.DataFrame({'Date': [200104, 200508, 201002, 201602, 199912, 200611]})
df['EndOfMonth'] = pd.to_datetime(df['Date'], format="%Y%m") + MonthEnd(1)


Date EndOfMonth
0  200104 2001-04-30
1  200508 2005-08-31
2  201002 2010-02-28
3  201602 2016-02-29
4  199912 1999-12-31
5  200611 2006-11-30

Agreed that root offers is the right method. However, readers who blindly use MonthEnd(1) are in for a surprise if they use the last date of the month as an input:

In [4]: pd.Timestamp('2014-01-01') + MonthEnd(1)
Out[4]: Timestamp('2014-01-31 00:00:00')


In [5]: pd.Timestamp('2014-01-31') + MonthEnd(1)
Out[5]: Timestamp('2014-02-28 00:00:00')

Using MonthEnd(0) instead gives this:

In [7]: pd.Timestamp('2014-01-01') + MonthEnd(0)
Out[7]: Timestamp('2014-01-31 00:00:00')


In [8]: pd.Timestamp('2014-01-31') + MonthEnd(0)
Out[8]: Timestamp('2014-01-31 00:00:00')

Example to obtain the month end as a string:

from pandas.tseries.offsets import MonthEnd
(pd.Timestamp.now() + MonthEnd(0)).strftime('%Y-%m-%dT00:00:00')
# '2014-01-31T00:00:00'

The end of the month can be the last day/minute/second/millisecond/microsecond/nanosecond of the month depending upon the offset needed by your use case. Given a date, to derive the last unit of the month, use the applicable anchored offset semantics. For example:

import pandas as pd


def last_second_of_month(date: str) -> str:
return str(pd.Timestamp(date) + pd.offsets.MonthBegin() - pd.offsets.Second())

As needed, replace Second() above with Day(), Minute(), Milli(), Micro(), or Nano().

Here is an alternative implementation with the same result:

import pandas as pd


def last_second_of_month(date: str) -> str:
return str((pd.Timestamp(date) + pd.offsets.MonthEnd(0)).date()) + " 23:59:59"

Examples:

>>> last_second_of_month('2020-10')
'2020-10-31 23:59:59'
>>> last_second_of_month('2020-10-01')
'2020-10-31 23:59:59'
>>> last_second_of_month('2020-10-15')
'2020-10-31 23:59:59'
>>> last_second_of_month('2020-10-30')
'2020-10-31 23:59:59'
>>> last_second_of_month('2020-10-31')
'2020-10-31 23:59:59'

As a cautionary note, do not use pd.Timestamp(date) + pd.offsets.MonthEnd() + pd.offsets.Day() - pd.offsets.Second() as it doesn't work as required for the last date of a month. This observation about pd.offsets.MonthEnd(1) is credited to the answer by Martien.