在熊猫数据框中设置新列以避免 SettingWithCopyPolice 的正确方法

试图在 netc df 中创建一个新列,但我收到了警告

netc["DeltaAMPP"] = netc.LOAD_AM - netc.VPP12_AM


C:\Anaconda\lib\site-packages\ipykernel\__main__.py:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

什么是正确的方法来创建一个字段在新版本的熊猫,以避免得到警告?

pd.__version__
Out[45]:
u'0.19.2+0.g825876c.dirty'
51118 次浏览

As it says in the error, try using .loc[row_indexer,col_indexer] to create the new column.

netc.loc[:,"DeltaAMPP"] = netc.LOAD_AM - netc.VPP12_AM.

Notes

By the Pandas Indexing Docs your code should work.

netc["DeltaAMPP"] = netc.LOAD_AM - netc.VPP12_AM

gets translated to

netc.__setitem__('DeltaAMPP', netc.LOAD_AM - netc.VPP12_AM)

Which should have predictable behaviour. The SettingWithCopyWarning is only there to warn users of unexpected behaviour during chained assignment (which is not what you're doing). However, as mentioned in the docs,

Sometimes a SettingWithCopy warning will arise at times when there’s no obvious chained indexing going on. These are the bugs that SettingWithCopy is designed to catch! Pandas is probably trying to warn you that you’ve done this:

The docs then go on to give an example of when one might get that error even when it's not expected. So I can't tell why that's happening without more context.

Your example is incomplete, as it doesn't show where netc comes from. It is likely that netc itself is the product of slicing, and as such Pandas cannot make guarantees that it isn't a view or a copy.

For example, if you're doing this:

netc = netb[netb["DeltaAMPP"] == 0]
netc["DeltaAMPP"] = netc.LOAD_AM - netc.VPP12_AM

then Pandas wouldn't know if netc is a view or a copy. If it were a one-liner, it would effectively be like this:

netb[netb["DeltaAMPP"] == 0]["DeltaAMPP"] = netc.LOAD_AM - netc.VPP12_AM

where you can see the double indexing more clearly.

If you want to make netc separate from netb, one possible remedy might be to force a copy in the first line (the loc is to make sure we're not copying two times), like:

netc = netb.loc[netb["DeltaAMPP"] == 0].copy()

If, on the other hand, you want to have netb modified with the new column, you may do:

netb.loc[netb["DeltaAMPP"] == 0, "DeltaAMPP"] = netc.LOAD_AM - netc.VPP12_AM

You need to reset_index when you will create column especially if you have filtered on specific values... then you don't need to use .loc[row_indexer,col_indexer]

netc.reset_index(drop=True, inplace=True)
netc["DeltaAMPP"] = netc.LOAD_AM - netc.VPP12_AM

Then it should work :)

I had the SettingWithCopyWarning-issue, when assigning data to a DataFrame df, which was constructed by indexing. Both commands

  • df['new_column'] = something
  • df.loc[:, 'new_column'] = something

did not work without the warning. As soon as copying df (DataFrame.copy()) everything was fine.

In the code below, compare df0 = df_test[df_test['a']>3] and df1 = df_test[df_test['a']>3].copy(). For df0 both assignments throw the Warning. For df1 both work fine.

>>> df_test
a     b     c     d  e
0   0.0   1.0   2.0   3.0  0
1   4.0   5.0   6.0   7.0  1
2   8.0   9.0  10.0  11.0  2
3  12.0  13.0  14.0  15.0  3
4  16.0  17.0  18.0  19.0  4
>>> df0 = df_test[df_test['a']>3]
>>> df1 = df_test[df_test['a']>3].copy()
>>> df0['e'] = np.arange(4)
__main__:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead


See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
>>> df1['e'] = np.arange(4)
>>> df0.loc[2, 'a'] = 77
/opt/anaconda3/lib/python3.7/site-packages/pandas/core/indexing.py:1719: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead


See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self._setitem_single_column(loc, value, pi)
>>> df1.loc[2, 'a'] = 77
>>> df0
a     b     c     d  e
1   4.0   5.0   6.0   7.0  0
2  77.0   9.0  10.0  11.0  1
3  12.0  13.0  14.0  15.0  2
4  16.0  17.0  18.0  19.0  3
>>> df1
a     b     c     d  e
1   4.0   5.0   6.0   7.0  0
2  77.0   9.0  10.0  11.0  1
3  12.0  13.0  14.0  15.0  2
4  16.0  17.0  18.0  19.0  3

By the way: It is recommended to read the docs about this issue (Link in Warning)

As pointed out in other answers, there is a good chance that you have done some filtering on the data, else this warning should not have popped up (since your steps are correct).

Assuming you have done some filtering, you could try doing the following steps:

netc_copied = netc.copy()
netc.loc[:, "DeltaAMPP"] = netc_copied["LOAD_AM"] - netc_copied["VPP12_AM"]

Note that I have added a new column in the original DataFrame. You could do this in the copied DataFrame too.

A simpler solution is to just use 'assign':

netc = netc.assign(DeltaAMPP=netc_copied['LOAD_AM']-netc_copied['VPP12_AM'])

Alternatively you can also use eval:

netc.eval('DeltaAMPP = LOAD_AM - VPP12_AM', inplace = True)

Since inplace=True you don't need to assign it back to netc.

you'll will still get error even after using .loc or .iloc for slicing, all you have to do is reset index after slicing

df.reset_index()