在执行 groupby 时保留其他列

小开

最佳答案

Method #1: use idxmin() to get the indices of the elements of minimum diff, and then select those:

>>> df.loc[df.groupby("item")["diff"].idxmin()]
item  diff  otherstuff
1     1     1           2
6     2    -6           2
7     3     0           0


[3 rows x 3 columns]

Method #2: sort by diff, and then take the first element in each item group:

>>> df.sort_values("diff").groupby("item", as_index=False).first()
item  diff  otherstuff
0     1     1           2
1     2    -6           2
2     3     0           0


[3 rows x 3 columns]

Note that the resulting indices are different even though the row content is the same.

小开

The above answer worked great if there is / you want one min. In my case there could be multiple mins and I wanted all rows equal to min which .idxmin() doesn't give you. This worked

def filter_group(dfg, col):
return dfg[dfg[col] == dfg[col].min()]


df = pd.DataFrame({'g': ['a'] * 6 + ['b'] * 6, 'v1': (list(range(3)) + list(range(3))) * 2, 'v2': range(12)})
df.groupby('g',group_keys=False).apply(lambda x: filter_group(x,'v1'))

As an aside, .filter() is also relevant to this question but didn't work for me.

小开

If you know that all of your "items" have more than one record you can sort, then use duplicated:

df.sort_values(by='diff').duplicated(subset='item', keep='first')

小开

You can use DataFrame.sort_values with DataFrame.drop_duplicates:

df = df.sort_values(by='diff').drop_duplicates(subset='item')
print (df)
item  diff  otherstuff
6     2    -6           2
7     3     0           0
1     1     1           2

If possible multiple minimal values per groups and want all min rows use boolean indexing with transform for minimal values per groups:

print (df)
item  diff  otherstuff
0     1     2           1
1     1     1           2 <-multiple min
2     1     1           7 <-multiple min
3     2    -1           0
4     2     1           3
5     2     4           9
6     2    -6           2
7     3     0           0
8     3     2           9


print (df.groupby("item")["diff"].transform('min'))
0    1
1    1
2    1
3   -6
4   -6
5   -6
6   -6
7    0
8    0
Name: diff, dtype: int64


df = df[df.groupby("item")["diff"].transform('min') == df['diff']]
print (df)
item  diff  otherstuff
1     1     1           2
2     1     1           7
6     2    -6           2
7     3     0           0

小开

I tried everyone's method and I couldn't get it to work properly. Instead I did the process step-by-step and ended up with the correct result.

df.sort_values(by='item', inplace=True, ignore_index=True)
df.drop_duplicates(subset='diff', inplace=True, ignore_index=True)
df.sort_values(by=['diff'], inplace=True, ignore_index=True)

For a little more explanation:

Sort items by the minimum value you want
Drop the duplicates of the column you want to sort with
Resort the data because the data is still sorted by the minimum values