如何排序熊猫数据帧从一列

小开

最佳答案

使用sort_values按特定列的值对df进行排序:

In [18]:
df.sort_values('2')


Out[18]:
0          1     2
4    85.6    January   1.0
3    95.5   February   2.0
7   104.8      March   3.0
0   354.7      April   4.0
8   283.5        May   5.0
6   238.7       June   6.0
5   152.0       July   7.0
1    55.4     August   8.0
11  212.7  September   9.0
10  249.6    October  10.0
9   278.8   November  11.0
2   176.5   December  12.0

如果希望按两列排序，则将列标签列表传递给sort_values，列标签按照排序优先级排序。如果使用df.sort_values(['2', '0'])，结果将按列2和列0排序。当然，这对于这个例子来说没有意义，因为df['2']中的每个值都是唯一的。

小开

只是在数据上增加了一些操作。假设我们有一个数据帧df，我们可以做几个操作来获得所需的输出

ID         cost      tax    label
1       216590      1600    test
2       523213      1800    test
3          250      1500    experiment


(df['label'].value_counts().to_frame().reset_index()).sort_values('label', ascending=False)

将给出sorted输出标签作为dataframe

    index   label
0   test        2
1   experiment  1

小开

我尝试了上面的解决方案，但没有达到效果，所以我找到了一个适合我的不同的解决方案。ascending=False是将数据帧按下行和默认为True的顺序排序。我使用的是python 3.6.6和pandas 0.23.4版本。

final_df = df.sort_values(by=['2'], ascending=False)

你可以在pandas文档在这里中看到更多细节。

小开

作为另一种解决方案:

而不是创建第二列，你可以分类你的字符串数据(月份名)，并像这样排序:

df.rename(columns={1:'month'},inplace=True)
df['month'] = pd.Categorical(df['month'],categories=['December','November','October','September','August','July','June','May','April','March','February','January'],ordered=True)
df = df.sort_values('month',ascending=False)

它将根据您在创建Categorical对象时指定的month name为您提供有序数据。

小开

下面是根据pandas文档的sort_values模板。

DataFrame.sort_values(by, axis=0,
ascending=True,
inplace=False,
kind='quicksort',
na_position='last',
ignore_index=False, key=None)[source]

在这种情况下，它是这样的。

# EYZ0

API参考pandas.DataFrame.sort_values

小开

使用列名对我很有用。

sorted_df = df.sort_values(by=['Column_name'], ascending=True)

小开

这对我很有效

df.sort_values(by='Column_name', inplace=True, ascending=False)

小开

熊猫的EYZ1号就可以了。

可以传递各种参数，例如ascending (bool或bool列表):

升序排序和降序排序。为多个排序顺序指定列表。如果这是一个bool列表，则必须匹配by的长度。

由于默认值为升序和OP的目标是升序排序，因此不需要指定该参数(参见下面最后一个注释，了解降序排序的方法)，因此可以使用以下方法之一:

原地执行操作，并保持相同的变量名。这需要传递inplace=True，如下所示:

df.sort_values(by=['2'], inplace=True)


# or


df.sort_values(by = '2', inplace = True)


# or


df.sort_values('2', inplace = True)

如果不要求执行in-place操作，则可以将更改(排序)分配给变量:
- 使用与原始数据帧相同的名称，df作为
```
df = df.sort_values(by=['2'])
```
- 使用不同的名称，例如df_new，如
```
df_new = df.sort_values(by=['2'])
```

前面的所有操作都将给出以下输出

        0          1     2
4    85.6    January   1.0
3    95.5   February   2.0
7   104.8      March   3.0
0   354.7      April   4.0
8   283.5        May   5.0
6   238.7       June   6.0
5     152       July   7.0
1    55.4     August   8.0
11  212.7  September   9.0
10  249.6    October  10.0
9   278.8   November  11.0
2   176.5   December  12.0

最后，可以使用pandas.DataFrame.reset_index重置索引，以获得以下结果

df.reset_index(drop = True, inplace = True)


# or


df = df.reset_index(drop = True)


[Out]:


0          1     2
0    85.6    January   1.0
1    95.5   February   2.0
2   104.8      March   3.0
3   354.7      April   4.0
4   283.5        May   5.0
5   238.7       June   6.0
6     152       July   7.0
7    55.4     August   8.0
8   212.7  September   9.0
9   249.6    October  10.0
10  278.8   November  11.0
11  176.5   December  12.0

按升序排序并重置索引的一行程序如下所示

df = df.sort_values(by=['2']).reset_index(drop = True)


[Out]:


0          1     2
0    85.6    January   1.0
1    95.5   February   2.0
2   104.8      March   3.0
3   354.7      April   4.0
4   283.5        May   5.0
5   238.7       June   6.0
6     152       July   7.0
7    55.4     August   8.0
8   212.7  September   9.0
9   249.6    October  10.0
10  278.8   November  11.0
11  176.5   December  12.0

注:

如果一个人没有在原地做操作，忘记上面提到的步骤可能会导致一个人(如这个用户)不能得到预期的结果。
对于使用inplace有很多强烈的意见。为此，一个人可能会想读这个吗。
第一个假设列2不是字符串。如果是，则必须转换为:
- 使用# EYZ1 < p >
```
 df['2'] = pd.to_numeric(df['2'])
```
- 使用# EYZ1 < p >
```
 df['2'] = df['2'].astype(float)
```

如果一个人想在降序排列，他需要传递ascending=False作为

 df = df.sort_values(by=['2'], ascending=False)


# or


df.sort_values(by = '2', ascending=False, inplace=True)


[Out]:


0          1     2
2   176.5   December  12.0
9   278.8   November  11.0
10  249.6    October  10.0
11  212.7  September   9.0
1    55.4     August   8.0
5     152       July   7.0
6   238.7       June   6.0
8   283.5        May   5.0
0   354.7      April   4.0
7   104.8      March   3.0
3    95.5   February   2.0
4    85.6    January   1.0

小开

这个对我很有用:

df=df.sort_values(by=[2])

而:

df=df.sort_values(by=['2'])

不起作用。

小开

< p >的例子: 假设您有一个值为1和0的列，并且您希望分离并只使用一个值，则:

// furniture is one of the columns in the csv file.
 



allrooms = data.groupby('furniture')['furniture'].agg('count')
allrooms




myrooms1 = pan.DataFrame(allrooms, columns = ['furniture'], index = [1])


myrooms2 = pan.DataFrame(allrooms, columns = ['furniture'], index = [0])


print(myrooms1);print(myrooms2)

小开

你可能需要在排序后重置索引:

df = df.sort_values('2')
df = df.reset_index(drop=True)

小开

只是增加了一些见解

df=raw_df['2'].sort_values() # will sort only one column (i.e 2)

但是,

df =raw_df.sort_values(by=["2"] , ascending = False)  # this  will sort the whole df in decending order on the basis of the column "2"

小开

如果你想动态排序列，但不是字母排序。并且不想使用pd.sort_values()。你可以试试下面的解决方案。

问题:将列“col1”;在这个序列中['A'， 'C'， 'D'， 'B']

import pandas as pd
import numpy as np


## Sample DataFrame ##
df = pd.DataFrame({'col1': ['A', 'B', 'D', 'C', 'A']})


>>> df
col1
0    A
1    B
2    D
3    C
4    A
## Solution ##


conditions = []
values = []


for i,j in enumerate(['A','C','D','B']):
conditions.append((df['col1'] == j))
values.append(i)


df['col1_Num'] = np.select(conditions, values)


df.sort_values(by='col1_Num',inplace = True)


>>> df


col1  col1_Num
0    A         0
4    A         0
3    C         1
2    D         2
1    B         3