熊猫 date_range 在月初生成月度数据

我试图生成一个每月数据的日期范围,其中的日期总是在月初:

pd.date_range(start='1/1/1980', end='11/1/1991', freq='M')

这将生成 1/31/19802/29/1980等等。相反,我只需要 1/1/19802/1/1980..。

我曾经看到有人问过关于生成数据的问题,这些数据总是在一个月中的某一天生成,答案是不可能,但是月初肯定是可能的!

84780 次浏览

You can do this by changing the freq argument from 'M' to 'MS':

d = pandas.date_range(start='1/1/1980', end='11/1/1990', freq='MS')
print(d)

This should now print:

DatetimeIndex(['1980-01-01', '1980-02-01', '1980-03-01', '1980-04-01',
'1980-05-01', '1980-06-01', '1980-07-01', '1980-08-01',
'1980-09-01', '1980-10-01',
...
'1990-02-01', '1990-03-01', '1990-04-01', '1990-05-01',
'1990-06-01', '1990-07-01', '1990-08-01', '1990-09-01',
'1990-10-01', '1990-11-01'],
dtype='datetime64[ns]', length=131, freq='MS', tz=None)

Look into the offset aliases part of the documentation. There it states that 'M' is for the end of the month (month end frequency) while 'MS' for the beginning (month start frequency).

It is worth noting that the 'MS' option of pandas.date_range() suggested by Dimitris makes the range start at the beginning of the next month, which may not be expected :

start = "2020-03-08"
end = "2021-03-08"
pd.date_range(start, end, freq='MS')

results in

DatetimeIndex(['2020-04-01', '2020-05-01', '2020-06-01', '2020-07-01',
'2020-08-01', '2020-09-01', '2020-10-01', '2020-11-01',
'2020-12-01', '2021-01-01', '2021-02-01', '2021-03-01'],
dtype='datetime64[ns]', freq='MS')

A workaround is to work only with the year and month of the start date :

pd.date_range(start[:7], end, freq='MS')

will then give

DatetimeIndex(['2020-03-01', '2020-04-01', '2020-05-01', '2020-06-01',
'2020-07-01', '2020-08-01', '2020-09-01', '2020-10-01',
'2020-11-01', '2020-12-01', '2021-01-01', '2021-02-01',
'2021-03-01'],
dtype='datetime64[ns]', freq='MS')

If you wish to keep the same starting day for each month, you can then add the offset with pd.DateOffset() :

pd.date_range(start[:7], end, freq='MS') + pd.DateOffset(days=7)

will give

DatetimeIndex(['2020-03-08', '2020-04-08', '2020-05-08', '2020-06-08',
'2020-07-08', '2020-08-08', '2020-09-08', '2020-10-08',
'2020-11-08', '2020-12-08', '2021-01-08', '2021-02-08',
'2021-03-08'],
dtype='datetime64[ns]', freq=None)