未来警告: 元素方式比较失败; 返回标量,但将来会执行元素方式比较

我在 Python3上使用熊猫 0.19.1。我收到了这些代码行的警告。我试图获得一个列表,其中包含字符串 Peter出现在 Unnamed: 5列中的所有行号。

df = pd.read_excel(xls_path)
myRows = df[df['Unnamed: 5'] == 'Peter'].index.tolist()


"\Python36\lib\site-packages\pandas\core\ops.py:792: FutureWarning: elementwise
comparison failed; returning scalar, but in the future will perform
elementwise comparison
result = getattr(x, name)(y)"


我对相同警告消息的体验是由 TypeError 引起的。

TypeError: 无效的类型比较

因此,您可能需要检查 Unnamed: 5的数据类型

for x in df['Unnamed: 5']:
print(type(x))  # are they 'str' ?


import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(3, 2), columns=['num1', 'num2'])
df['num3'] = 3
df.loc[df['num3'] == '3', 'num3'] = 4  # TypeError and the Warning
df.loc[df['num3'] == 3, 'num3'] = 4  # No Error



import numpy as np
print(np.__version__)   # Numpy version '1.12.0'
'x' in np.arange(5)       #Future warning thrown here

FutureWarning: elementwise comparison failed; returning scalar instead, but in the
future will perform elementwise comparison

使用 double equals 运算符重现这个 bug 的另一种方法是:

import numpy as np
np.arange(5) == np.arange(5).astype(str)    #FutureWarning thrown here


在将字符串与 Numpy 的数字类型进行比较时,Numpy 和本地 python 之间存在分歧。注意,右边的 operand 是 python 的草皮,一个原始字符串,中间的 operation 是 python 的草皮,而左边的 operand 是 numpy 的草皮。您应该返回 Python 样式的 Scalar 还是 Numpy 样式的布尔数组?Numpy 说 ndarray of bool,Python 开发者不同意。典型的僵持。

如果数组中存在 item,它应该是元素比较还是 Scalar?

如果您的代码或库正在使用 in==操作符来比较 python 字符串和 numpy ndarray,那么它们是不兼容的,所以如果您尝试使用它,它会返回一个标量,但只是暂时的。该警告表明,在将来,这种行为可能会改变,因此如果 python/Numpy 决定采用 Numpy 样式,那么您的代码会吐得到处都是。


Numpy 和 Python 处于对峙状态,目前操作返回一个标量,但将来可能会改变。




要么锁定 python 和 numpy 的版本,忽略警告并期望行为不变,要么将 ==in的左右操作数转换为 numpy 类型或原始 python 数值类型。


import warnings
import numpy as np
warnings.simplefilter(action='ignore', category=FutureWarning)
print('x' in np.arange(5))   #returns False, without Warning


import warnings
import numpy as np

with warnings.catch_warnings():
warnings.simplefilter(action='ignore', category=FutureWarning)
print('x' in np.arange(2))   #returns False, warning is suppressed

print('x' in np.arange(10))   #returns False, Throws FutureWarning

只要按名称禁止显示警告,然后在旁边大声注释提到 python 和 numpy 的当前版本,说这段代码很脆弱,需要这些版本,并在这里放置一个链接。把罐子踢到路边去。


如果你的数组不是太大或者你没有太多的数组,你也许可以把 ==的左边变成一个字符串:

myRows = df[str(df['Unnamed: 5']) == 'Peter'].index.tolist()

但是,如果 df['Unnamed: 5']是一个字符串,那么它会慢1.5倍,如果 df['Unnamed: 5']是一个小的数字阵列(长度 = 10) ,那么它会慢25-30倍,如果是一个长度为100的数字阵列,那么它会慢150-160倍(平均次数超过500次)。

a = linspace(0, 5, 10)
b = linspace(0, 50, 100)
n = 500
string1 = 'Peter'
string2 = 'blargh'
times_a = zeros(n)
times_str_a = zeros(n)
times_s = zeros(n)
times_str_s = zeros(n)
times_b = zeros(n)
times_str_b = zeros(n)
for i in range(n):
t0 = time.time()
tmp1 = a == string1
t1 = time.time()
tmp2 = str(a) == string1
t2 = time.time()
tmp3 = string2 == string1
t3 = time.time()
tmp4 = str(string2) == string1
t4 = time.time()
tmp5 = b == string1
t5 = time.time()
tmp6 = str(b) == string1
t6 = time.time()
times_a[i] = t1 - t0
times_str_a[i] = t2 - t1
times_s[i] = t3 - t2
times_str_s[i] = t4 - t3
times_b[i] = t5 - t4
times_str_b[i] = t6 - t5
print('Small array:')
print('Time to compare without str conversion: {} s. With str conversion: {} s'.format(mean(times_a), mean(times_str_a)))
print('Ratio of time with/without string conversion: {}'.format(mean(times_str_a)/mean(times_a)))

print('\nBig array')
print('Time to compare without str conversion: {} s. With str conversion: {} s'.format(mean(times_b), mean(times_str_b)))

print('Time to compare without str conversion: {} s. With str conversion: {} s'.format(mean(times_s), mean(times_str_s)))
print('Ratio of time with/without string conversion: {}'.format(mean(times_str_s)/mean(times_s)))


Small array:
Time to compare without str conversion: 6.58464431763e-06 s. With str conversion: 0.000173756599426 s
Ratio of time with/without string conversion: 26.3881526541

Big array
Time to compare without str conversion: 5.44309616089e-06 s. With str conversion: 0.000870866775513 s

Time to compare without str conversion: 5.89370727539e-07 s. With str conversion: 8.30173492432e-07 s
Ratio of time with/without string conversion: 1.40857605178

对此的一个快速解决方案是使用 numpy.core.defchararray。我还遇到了同样的警告消息,并能够使用上述模块解决这个问题。

import numpy.core.defchararray as npd
resultdataset = npd.equal(dataset1, dataset2)

当我试图设置将文件读入 Panda数据帧的 index_col时,我得到了相同的错误:

df = pd.read_csv('my_file.tsv', sep='\t', header=0, index_col=['0'])  ## or same with the following
df = pd.read_csv('my_file.tsv', sep='\t', header=0, index_col=[0])

我以前从来没有遇到过这样的错误。我仍然试图找出这背后的原因(使用@Eric Leschinski 的解释和其他人)。


df = pd.read_csv('my_file.tsv', sep='\t', header=0)  ## not setting the index_col
df.set_index(['0'], inplace=True)


我得到这个警告是因为我认为我的列包含空字符串,但是在检查时,它包含 np.nan!

if df['column'] == '':




>>> import numpy as np
>>> import operator
>>> import pandas as pd

>>> x = [1, 2, 1, 2]
>>> %time count = np.sum(np.equal(1, x))
>>> print("Count {} using numpy equal with ints".format(count))
CPU times: user 52 µs, sys: 0 ns, total: 52 µs
Wall time: 56 µs
Count 2 using numpy equal with ints

因此,我们的基准是,计数应该是正确的 2,我们应该采取大约 50 us


>>> x = ['s', 'b', 's', 'b']
>>> %time count = np.sum(np.equal('s', x))
>>> print("Count {} using numpy equal".format(count))
CPU times: user 145 µs, sys: 24 µs, total: 169 µs
Wall time: 158 µs
Count NotImplemented using numpy equal
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/ipykernel_launcher.py:1: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
"""Entry point for launching an IPython kernel.

在这里,我们得到了错误的答案(NotImplemented != 2) ,它花了我们很长时间,它抛出了警告。


>>> %time count = np.sum(x == 's')
>>> print("Count {} using ==".format(count))
CPU times: user 46 µs, sys: 1 µs, total: 47 µs
Wall time: 50.1 µs
Count 0 using ==

同样,答案也是错误的(0 != 2)。由于没有后续的警告(0可以像 2一样传递) ,这种情况更加隐蔽。


>>> %time count = np.sum([operator.eq(_x, 's') for _x in x])
>>> print("Count {} using list comprehension".format(count))
CPU times: user 55 µs, sys: 1 µs, total: 56 µs
Wall time: 60.3 µs
Count 2 using list comprehension



>>> y = pd.Series(x)
>>> %time count = np.sum(y == 's')
>>> print("Count {} using pandas ==".format(count))
CPU times: user 453 µs, sys: 31 µs, total: 484 µs
Wall time: 463 µs
Count 2 using pandas ==


最后,我要使用的选项是: 将 numpy数组转换为 object类型:

>>> x = np.array(['s', 'b', 's', 'b']).astype(object)
>>> %time count = np.sum(np.equal('s', x))
>>> print("Count {} using numpy equal".format(count))
CPU times: user 50 µs, sys: 1 µs, total: 51 µs
Wall time: 55.1 µs
Count 2 using numpy equal



for t in dfObj['time']:
if type(t) == str:
the_date = dateutil.parser.parse(t)
loc_dt_int = int(the_date.timestamp())
dfObj.loc[t == dfObj.time, 'time'] = loc_dt_int


for t in dfObj['time']:
the_date = dateutil.parser.parse(t)
loc_dt_int = int(the_date.timestamp())
dfObj.loc[t == dfObj.time, 'time'] = loc_dt_int
except Exception as e:

以避免比较,这是抛出警告-如上所述。我只需要避免异常,因为在 for 循环中有 dfObj.loc,也许有一种方法可以告诉它不要检查它已经更改的行。

Eric 的回答很有帮助地解释了问题来自于将熊猫系列(包含 NumPy 数组)与 Python 字符串进行比较。不幸的是,他的两个变通方法都只是抑制了警告。

要编写不会首先导致警告的代码,显式地将字符串与 Series 中的每个元素进行比较,并为每个元素获得一个单独的 bool。例如,您可以使用 map和一个匿名函数。

myRows = df[df['Unnamed: 5'].map( lambda x: x == 'Peter' )].index.tolist()


myRows = df[df['Unnamed: 5'].isin( [ 'Peter' ] )].index.tolist()

无法击败 Eric Leschinski 令人敬畏的详细答案,但这里有一个快速解决原始问题的方法,我认为这个问题还没有被提及——把字符串放在一个列表中,使用 .isin而不是 ==


import pandas as pd
import numpy as np

df = pd.DataFrame({"Name": ["Peter", "Joe"], "Number": [1, 2]})

# Raises warning using == to compare different types:
df.loc[df["Number"] == "2", "Number"]

# No warning using .isin:
df.loc[df["Number"].isin(["2"]), "Number"]

在我的例子中,发生警告是因为布尔索引的常规类型——因为该系列只有 np.nan。示范(熊猫1.0.3) :

>>> import pandas as pd
>>> import numpy as np
>>> pd.Series([np.nan, 'Hi']) == 'Hi'
0    False
1     True
>>> pd.Series([np.nan, np.nan]) == 'Hi'
~/anaconda3/envs/ms3/lib/python3.7/site-packages/pandas/core/ops/array_ops.py:255: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
res_values = method(rvalues)
0    False
1    False

我认为熊猫1.0真的希望你使用新的 'string'数据类型,它允许 pd.NA值:

>>> pd.Series([pd.NA, pd.NA]) == 'Hi'
0    False
1    False
>>> pd.Series([np.nan, np.nan], dtype='string') == 'Hi'
0    <NA>
1    <NA>
>>> (pd.Series([np.nan, np.nan], dtype='string') == 'Hi').fillna(False)
0    False
1    False


根据我的经验, 我将‘ numpy.ndarray’与“”(空字符串)进行了比较。

if( (self.images[0] != "" ):
# Also didn't work.
if( (self.images[0].astype(str) != "" ):


if( len(self.images[0]) != 0 ):