更新熊猫中满足某些条件的行值

小开

最佳答案

如果需要将两列更新为相同的值，我认为可以使用 loc:

df1.loc[df1['stream'] == 2, ['feat','another_feat']] = 'aaaa'
print df1
stream        feat another_feat
a       1  some_value   some_value
b       2        aaaa         aaaa
c       2        aaaa         aaaa
d       3  some_value   some_value

如果您需要单独更新，一个选项是使用:

df1.loc[df1['stream'] == 2, 'feat'] = 10
print df1
stream        feat another_feat
a       1  some_value   some_value
b       2          10   some_value
c       2          10   some_value
d       3  some_value   some_value

另一种常见的选择是使用 numpy.where:

df1['feat'] = np.where(df1['stream'] == 2, 10,20)
print df1
stream  feat another_feat
a       1    20   some_value
b       2    10   some_value
c       2    10   some_value
d       3    20   some_value

编辑: 如果需要除去所有条件为 True但不包含 stream的列，请使用:

print df1
stream  feat  another_feat
a       1     4             5
b       2     4             5
c       2     2             9
d       3     1             7


#filter columns all without stream
cols = [col for col in df1.columns if col != 'stream']
print cols
['feat', 'another_feat']


df1.loc[df1['stream'] == 2, cols ] = df1 / 2
print df1
stream  feat  another_feat
a       1   4.0           5.0
b       2   2.0           2.5
c       2   1.0           4.5
d       3   1.0           7.0

如果可以处理多个条件，请使用多个 < a href = “ https://numpy.org/doc/stability/reference/generated/numpy.where.html”rel = “ noReferrer”> numpy.where 或 numpy.select:

df0 = pd.DataFrame({'Col':[5,0,-6]})


df0['New Col1'] = np.where((df0['Col'] > 0), 'Increasing',
np.where((df0['Col'] < 0), 'Decreasing', 'No Change'))


df0['New Col2'] = np.select([df0['Col'] > 0, df0['Col'] < 0],
['Increasing',  'Decreasing'],
default='No Change')


print (df0)
Col    New Col1    New Col2
0    5  Increasing  Increasing
1    0   No Change   No Change
2   -6  Decreasing  Decreasing

小开

你可以对 .ix做同样的事情，像这样:

In [1]: df = pd.DataFrame(np.random.randn(5,4), columns=list('abcd'))


In [2]: df
Out[2]:
a         b         c         d
0 -0.323772  0.839542  0.173414 -1.341793
1 -1.001287  0.676910  0.465536  0.229544
2  0.963484 -0.905302 -0.435821  1.934512
3  0.266113 -0.034305 -0.110272 -0.720599
4 -0.522134 -0.913792  1.862832  0.314315


In [3]: df.ix[df.a>0, ['b','c']] = 0


In [4]: df
Out[4]:
a         b         c         d
0 -0.323772  0.839542  0.173414 -1.341793
1 -1.001287  0.676910  0.465536  0.229544
2  0.963484  0.000000  0.000000  1.934512
3  0.266113  0.000000  0.000000 -0.720599
4 -0.522134 -0.913792  1.862832  0.314315

剪辑

在额外的信息之后，下面将返回所有列(在满足某些条件的情况下) ，其值将减半:

>> condition = df.a > 0
>> df[condition][[i for i in df.columns.values if i not in ['a']]].apply(lambda x: x/2)

小开

另一个向量化的解决方案是使用 mask()方法将对应于 stream=2和 join()这些列的行减半为仅由 stream列组成的数据框架:

cols = ['feat', 'another_feat']
df[['stream']].join(df[cols].mask(df['stream'] == 2, lambda x: x/2))

或者你也可以 update()原始数据帧:

df.update(df[cols].mask(df['stream'] == 2, lambda x: x/2))

上述两种守则都具有以下作用:

如果要替换的值是一个常量(不是通过函数派生的) ，那么使用 mask()就更简单了; 例如，下面的代码用100替换了所有对应于 stream的 feat值，这些值等于1或3。¹

df[['stream']].join(df.filter(like='feat').mask(df['stream'].isin([1,3]), 100))

^{1: feat柱也可以用 filter()方法选择。}