我想在熊猫数据框架中将两列相乘，并将结果添加到一个新列中

小开

You can use the DataFrame apply method:

order_df['Value'] = order_df.apply(lambda row: (row['Prices']*row['Amount']
if row['Action']=='Sell'
else -row['Prices']*row['Amount']),
axis=1)

It is usually faster to use these methods rather than over for loops.

小开

最佳答案

If we're willing to sacrifice the succinctness of Hayden's solution, one could also do something like this:

In [22]: orders_df['C'] = orders_df.Action.apply(
lambda x: (1 if x == 'Sell' else -1))


In [23]: orders_df   # New column C represents the sign of the transaction
Out[23]:
Prices  Amount Action  C
0       3      57   Sell  1
1      89      42   Sell  1
2      45      70    Buy -1
3       6      43   Sell  1
4      60      47   Sell  1
5      19      16    Buy -1
6      56      89   Sell  1
7       3      28    Buy -1
8      56      69   Sell  1
9      90      49    Buy -1

Now we have eliminated the need for the if statement. Using DataFrame.apply(), we also do away with the for loop. As Hayden noted, vectorized operations are always faster.

In [24]: orders_df['Value'] = orders_df.Prices * orders_df.Amount * orders_df.C


In [25]: orders_df   # The resulting dataframe
Out[25]:
Prices  Amount Action  C  Value
0       3      57   Sell  1    171
1      89      42   Sell  1   3738
2      45      70    Buy -1  -3150
3       6      43   Sell  1    258
4      60      47   Sell  1   2820
5      19      16    Buy -1   -304
6      56      89   Sell  1   4984
7       3      28    Buy -1    -84
8      56      69   Sell  1   3864
9      90      49    Buy -1  -4410

This solution takes two lines of code instead of one, but is a bit easier to read. I suspect that the computational costs are similar as well.

小开

I think an elegant solution is to use the where method (also see the API docs):

In [37]: values = df.Prices * df.Amount


In [38]: df['Values'] = values.where(df.Action == 'Sell', other=-values)


In [39]: df
Out[39]:
Prices  Amount Action  Values
0       3      57   Sell     171
1      89      42   Sell    3738
2      45      70    Buy   -3150
3       6      43   Sell     258
4      60      47   Sell    2820
5      19      16    Buy    -304
6      56      89   Sell    4984
7       3      28    Buy     -84
8      56      69   Sell    3864
9      90      49    Buy   -4410

Further more this should be the fastest solution.

小开

For me, this is the clearest and most intuitive:

values = []
for action in ['Sell','Buy']:
amounts = orders_df['Amounts'][orders_df['Action'==action]].values
if action == 'Sell':
prices = orders_df['Prices'][orders_df['Action'==action]].values
else:
prices = -1*orders_df['Prices'][orders_df['Action'==action]].values
values += list(amounts*prices)
orders_df['Values'] = values

The .values method returns a numpy array allowing you to easily multiply element-wise and then you can cumulatively generate a list by 'adding' to it.

小开

Since this question came up again, I think a good clean approach is using assign.

The code is quite expressive and self-describing:

df = df.assign(Value = lambda x: x.Prices * x.Amount * x.Action.replace({'Buy' : 1, 'Sell' : -1}))

小开

Good solution from bmu. I think it's more readable to put the values inside the parentheses vs outside.

    df['Values'] = np.where(df.Action == 'Sell',
df.Prices*df.Amount,
-df.Prices*df.Amount)

Using some pandas built in functions.

    df['Values'] = np.where(df.Action.eq('Sell'),
df.Prices.mul(df.Amount),
-df.Prices.mul(df.Amount))

小开

To make things neat, I take Hayden's solution but make a small function out of it.

def create_value(row):
if row['Action'] == 'Sell':
return row['Prices'] * row['Amount']
else:
return -row['Prices']*row['Amount']

so that when we want to apply the function to our dataframe, we can do..

df['Value'] = df.apply(lambda row: create_value(row), axis=1)

...and any modifications only need to occur in the small function itself.

Concise, Readable, and Neat!

小开

First, multiply the columns Prices and Amount. Afterwards use mask to negate the values if the condition is True:

df.assign(
Values=(df["Prices"] * df["Amount"]).mask(df["Action"] == "Buy", lambda x: -x)
)