在熊猫中逐周分组

我有一个数据框:

Name   Date    Quantity
Apple  07/11/17  20
orange 07/14/17  20
Apple  07/14/17  70
Orange 07/25/17  40
Apple  07/20/17  30

我想用 NameDate把它们加起来,得到数量之和 详情:

日期 : 组,结果应在本周初(或仅在星期一)

Quantity : 如果两个或多个记录具有相同的名称和日期(如果在相同的时间间隔内) ,则求和

预期产出如下:

Name   Date    Quantity
Apple  07/10/17  90
orange 07/10/17  20
Apple  07/17/17  30
orange 07/24/17  40

先谢谢你

132219 次浏览

Let's use groupby, resample with W-Mon, and sum:

df.groupby('Name').resample('W-Mon', on='Date').sum().reset_index().sort_values(by='Date')

Output:

     Name       Date  Quantity
0   Apple 2017-07-17        90
3  orange 2017-07-17        20
1   Apple 2017-07-24        30
2  Orange 2017-07-31        40

First, convert column date to_datetime and subtract one week as we want the sum for the week ahead of the date and not the week before that date.

Then use groupby with Grouper by W-MON and aggregate sum:

df['Date'] = pd.to_datetime(df['Date']) - pd.to_timedelta(7, unit='d')
df = df.groupby(['Name', pd.Grouper(key='Date', freq='W-MON')])['Quantity']
.sum()
.reset_index()
.sort_values('Date')
print (df)
  Name       Date  Quantity
0   Apple 2017-07-10        90
3  orange 2017-07-10        20
1   Apple 2017-07-17        30
2  Orange 2017-07-24        40

First convert column date to_datetime. This will group by week starting with Mondays. It will output the week number (but you can change that looking up in

http://strftime.org/

df.groupby(['name', df['date'].dt.strftime('%W')])['quantity'].sum()

Output:

name    date
apple   28      90
29      30
orange  28      20
30      40

This groups every row on the previous Monday (if the date is already Monday, nothing is changed). This has the effect of grouping by week:

import pandas as pd, datetime as dt


# df = ...


df['WeekDate'] = df.apply(lambda row: row['Date'] - dt.timedelta(days=row['Date'].weekday()), axis=1)


perweek = df['WeekDate'].groupby(df['WeekDate']).count()

Exemple:

Date           WeekDate
2020-06-20     2020-06-15 <- monday
2020-06-21     2020-06-15
2020-06-24     2020-06-22 <- monday
2020-06-25     2020-06-22
2020-06-26     2020-06-22

You already received a lot of good answer and the question is quite old, but, given the fact some of the solutions use deprecated functions and I encounted the same problem and found a different solution I think could be helpful to someone to share it.

Given the dataframe you proposed:

Name   Date    Quantity
Apple  07/11/17  20
orange 07/14/17  20
Apple  07/14/17  70
Orange 07/25/17  40
Apple  07/20/17  30


We have to convert the values in 'Date' as Pandas' Datetime since they are strings right now.
Then we can use the Series' dt property that allow us to handle DateTime-like series and extract informations.

df['Date'] = pd.to_datetime(df['Date'], format='%m/%d/%y')

By having a DateTime format allow us to use the dt parameters to extract the number of the week associated to the date. In order to do not loose any information I prefer to add a new column with the week number. Once retrieved the number of the week we can group by that week.

df['WeekNumber'] = df['Date'].dt.isocalendar().week
df.groupby(['Name', 'WeekNumber']).sum()


Name    WeekNumber
Apple   28  90
29  30
Orange  28  20
30  40
Small problem: what if we consider different years?

There could be the case in whick our data have a range of years, in that situation we cannot consider only the week (otherwise we would mix up data from one year into another), so it would be useful to extract also the year column from isocalendar().

df['year'] = df['Date'].dt.isocalendar().year
df.groupby(['Name', 'WeekNumber', 'year']).sum()


Name    WeekNumber  year    Quantity
Apple   28          2017    90
29          2017    30
Orange  28          2017    20
30          2017    40

You can use the to_period method to get the date truncated to the first day of the week (or month if you use the period M):

df["Week"] = df["Date"].dt.to_period("W").dt.to_timestamp()
df.groupby(["Name", "Week")).sum()