Python Pandas: Group datetime column into hour and minute aggregations

This seems like it would be fairly straight forward but after nearly an entire day I have not found the solution. I've loaded my dataframe with read_csv and easily parsed, combined and indexed a date and a time column into one column but now I want to be able to just reshape and perform calculations based on hour and minute groupings similar to what you can do in excel pivot.

I know how to resample to hour or minute but it maintains the date portion associated with each hour/minute whereas I want to aggregate the data set ONLY to hour and minute similar to grouping in excel pivots and selecting "hour" and "minute" but not selecting anything else.

Any help would be greatly appreciated.

121955 次浏览

Can't you do, where df is your DataFrame:

times = pd.to_datetime(df.timestamp_col)
df.groupby([times.dt.hour, times.dt.minute]).value_col.sum()

Came across this when I was searching for this type of groupby. Wes' code above didn't work for me, not sure if it's because changes in pandas over time.

In pandas 0.16.2, what I did in the end was:

grp = data.groupby(by=[data.datetime_col.map(lambda x : (x.hour, x.minute))])
grp.count()

You'd have (hour, minute) tuples as the grouped index. If you want multi-index:

grp = data.groupby(by=[data.datetime_col.map(lambda x : x.hour),
data.datetime_col.map(lambda x : x.minute)])

Wes' code didn't work for me. But the DatetimeIndex function (docs) did:

times = pd.DatetimeIndex(data.datetime_col)
grouped = df.groupby([times.hour, times.minute])

The DatetimeIndex object is a representation of times in pandas. The first line creates a array of the datetimes. The second line uses this array to get the hour and minute data for all of the rows, allowing the data to be grouped (docs) by these values.

I have an alternative of Wes & Nix answers above, with just one line of code, assuming your column is already a datetime column, you don't need to get the hour and minute attributes separately:

df.groupby(df.timestamp_col.dt.time).value_col.sum()

This might be a little late but I found quite a good solution for any one that has the same problem. I have a df like this:

datetime              value
2022-06-28 13:28:08   15
2022-06-28 13:28:09   30
...                   ...
2022-06-28 14:29:11   20
2022-06-28 14:29:12   10

I want to convert those timestamps which are in intervals of a second to timestamps with an interval of minutes adding the value column in the process. There is a neat way of doing it:

df['datetime'] = pd.to_datetime(df['datetime']) #if not already as datetime object
grouped = df.groupby(pd.Grouper(key='datetime', axis=0, freq='T')).sum()
print(grouped.head())

Result:

datetime              value
2022-06-28 13:28:00   45
...                   ...
2022-06-28 14:29:00   30

freq='T' stands for minutes. You could also group it by hours or days. They are called Offset aliases.