如何在熊猫的价值分组系列?

我目前有一个 dtype 为 Timestamp的熊猫 Series,我想按日期对它进行分组(每个分组中有许多不同时间的行)。

看似显而易见的方法是类似于

grouped = s.groupby(lambda x: x.date())

但是,熊猫的 groupby是按索引分组的。我怎样才能让它按值分组呢?

57805 次浏览

You should convert it to a DataFrame, then add a column that is the date(). You can do groupby on the DataFrame with the date column.

df = pandas.DataFrame(s, columns=["datetime"])
df["date"] = df["datetime"].apply(lambda x: x.date())
df.groupby("date")

Then "date" becomes your index. You have to do it this way because the final grouped object needs an index so you can do things like select a group.

grouped = s.groupby(s)

Or:

grouped = s.groupby(lambda x: s[x])

Three methods:

DataFrame: pd.groupby(['column']).size()

Series: sel.groupby(sel).size()

Series to DataFrame:

pd.DataFrame( sel, columns=['column']).groupby(['column']).size()

For anyone else who wants to do this inline without throwing a lambda in (which tends to kill performance):

s.to_frame(0).groupby(0)[0]

To add another suggestion, I often use the following as it uses simple logic:

pd.Series(index=s.values).groupby(level=0)