我想在熊猫数据框架中将两列相乘,并将结果添加到一个新列中

我试图将大熊猫 Dataframe (orders_df)中的两个现有列相乘: Prices(股票收盘价)和 Amount(股票数量) ,并将计算结果添加到一个名为 Value的新列。由于某种原因,当我运行这段代码时,Value列下面的所有行都是正数,而有些行应该是负数。在 DataFrame 的 Action 列下面,有7行使用 'Sell'字符串,7行使用 'Buy'字符串。

for i in orders_df.Action:
if i  == 'Sell':
orders_df['Value'] = orders_df.Prices*orders_df.Amount
elif i == 'Buy':
orders_df['Value'] = -orders_df.Prices*orders_df.Amount)

请让我知道我做错了什么!

208279 次浏览

You can use the DataFrame apply method:

order_df['Value'] = order_df.apply(lambda row: (row['Prices']*row['Amount']
if row['Action']=='Sell'
else -row['Prices']*row['Amount']),
axis=1)

It is usually faster to use these methods rather than over for loops.

If we're willing to sacrifice the succinctness of Hayden's solution, one could also do something like this:

In [22]: orders_df['C'] = orders_df.Action.apply(
lambda x: (1 if x == 'Sell' else -1))


In [23]: orders_df   # New column C represents the sign of the transaction
Out[23]:
Prices  Amount Action  C
0       3      57   Sell  1
1      89      42   Sell  1
2      45      70    Buy -1
3       6      43   Sell  1
4      60      47   Sell  1
5      19      16    Buy -1
6      56      89   Sell  1
7       3      28    Buy -1
8      56      69   Sell  1
9      90      49    Buy -1

Now we have eliminated the need for the if statement. Using DataFrame.apply(), we also do away with the for loop. As Hayden noted, vectorized operations are always faster.

In [24]: orders_df['Value'] = orders_df.Prices * orders_df.Amount * orders_df.C


In [25]: orders_df   # The resulting dataframe
Out[25]:
Prices  Amount Action  C  Value
0       3      57   Sell  1    171
1      89      42   Sell  1   3738
2      45      70    Buy -1  -3150
3       6      43   Sell  1    258
4      60      47   Sell  1   2820
5      19      16    Buy -1   -304
6      56      89   Sell  1   4984
7       3      28    Buy -1    -84
8      56      69   Sell  1   3864
9      90      49    Buy -1  -4410

This solution takes two lines of code instead of one, but is a bit easier to read. I suspect that the computational costs are similar as well.

I think an elegant solution is to use the where method (also see the API docs):

In [37]: values = df.Prices * df.Amount


In [38]: df['Values'] = values.where(df.Action == 'Sell', other=-values)


In [39]: df
Out[39]:
Prices  Amount Action  Values
0       3      57   Sell     171
1      89      42   Sell    3738
2      45      70    Buy   -3150
3       6      43   Sell     258
4      60      47   Sell    2820
5      19      16    Buy    -304
6      56      89   Sell    4984
7       3      28    Buy     -84
8      56      69   Sell    3864
9      90      49    Buy   -4410

Further more this should be the fastest solution.

For me, this is the clearest and most intuitive:

values = []
for action in ['Sell','Buy']:
amounts = orders_df['Amounts'][orders_df['Action'==action]].values
if action == 'Sell':
prices = orders_df['Prices'][orders_df['Action'==action]].values
else:
prices = -1*orders_df['Prices'][orders_df['Action'==action]].values
values += list(amounts*prices)
orders_df['Values'] = values

The .values method returns a numpy array allowing you to easily multiply element-wise and then you can cumulatively generate a list by 'adding' to it.

Since this question came up again, I think a good clean approach is using assign.

The code is quite expressive and self-describing:

df = df.assign(Value = lambda x: x.Prices * x.Amount * x.Action.replace({'Buy' : 1, 'Sell' : -1}))

Good solution from bmu. I think it's more readable to put the values inside the parentheses vs outside.

    df['Values'] = np.where(df.Action == 'Sell',
df.Prices*df.Amount,
-df.Prices*df.Amount)

Using some pandas built in functions.

    df['Values'] = np.where(df.Action.eq('Sell'),
df.Prices.mul(df.Amount),
-df.Prices.mul(df.Amount))

To make things neat, I take Hayden's solution but make a small function out of it.

def create_value(row):
if row['Action'] == 'Sell':
return row['Prices'] * row['Amount']
else:
return -row['Prices']*row['Amount']

so that when we want to apply the function to our dataframe, we can do..

df['Value'] = df.apply(lambda row: create_value(row), axis=1)

...and any modifications only need to occur in the small function itself.

Concise, Readable, and Neat!

First, multiply the columns Prices and Amount. Afterwards use mask to negate the values if the condition is True:

df.assign(
Values=(df["Prices"] * df["Amount"]).mask(df["Action"] == "Buy", lambda x: -x)
)