我目前有一个 dtype 为 Timestamp的熊猫 Series,我想按日期对它进行分组(每个分组中有许多不同时间的行)。
Timestamp
Series
看似显而易见的方法是类似于
grouped = s.groupby(lambda x: x.date())
但是,熊猫的 groupby是按索引分组的。我怎样才能让它按值分组呢?
groupby
You should convert it to a DataFrame, then add a column that is the date(). You can do groupby on the DataFrame with the date column.
df = pandas.DataFrame(s, columns=["datetime"]) df["date"] = df["datetime"].apply(lambda x: x.date()) df.groupby("date")
Then "date" becomes your index. You have to do it this way because the final grouped object needs an index so you can do things like select a group.
grouped = s.groupby(s)
Or:
grouped = s.groupby(lambda x: s[x])
Three methods:
DataFrame: pd.groupby(['column']).size()
pd.groupby(['column']).size()
Series: sel.groupby(sel).size()
sel.groupby(sel).size()
Series to DataFrame:
pd.DataFrame( sel, columns=['column']).groupby(['column']).size()
For anyone else who wants to do this inline without throwing a lambda in (which tends to kill performance):
s.to_frame(0).groupby(0)[0]
To add another suggestion, I often use the following as it uses simple logic:
pd.Series(index=s.values).groupby(level=0)