Sub 错误，使用“ Expected string or byte-like object”

小开

最佳答案

正如您在注释中指出的，一些值看起来是 float，而不是字符串。在将其传递给 re.sub之前，需要将其更改为字符串。最简单的方法是在使用 re.sub时将 location更改为 str(location)。即使它已经是 str了，也不会有什么损失的。

letters_only = re.sub("[^a-zA-Z]",  # Search for all non-letters
" ",          # Replace all non-letters with spaces
str(location))

小开

我想最好使用 re.match ()函数。

import re
import nltk
from nltk.tokenize import word_tokenize
nltk.download('punkt')
sentences = word_tokenize("I love to learn NLP \n 'a :(")
#for i in range(len(sentences)):
sentences = [word.lower() for word in sentences if re.match('^[a-zA-Z]+', word)]
sentences

小开

最简单的解决方案是将巨蟒 str函数应用到您试图循环通过的列。

如果你使用的是 pandas，这可以实现为:

dataframe['column_name']=dataframe['column_name'].apply(str)

小开

我也有同样的问题。有趣的是，每次我做一件事，问题都没有解决，直到我意识到字符串中有两个特殊的字符。

例如，对我来说，文本有两个字符:

&lrm; _{(图片来源: http://en.wikipedia.org/wiki/left-to-right _ mark”rel = “ nofollow norefrer”> 左至右符号)}和 &zwnj; _{(图片来源: https://en.wikipedia.org/wiki/Zero-width _ non-join er”rel = “ nofollow norefrer”> 零宽不连字来源:}

我的解决办法是删除这两个字符，问题就解决了。

import re
mystring = "&lrm;Some Time W&zwnj;e"
mystring  = re.sub(r"&lrm;", "", mystring)
mystring  = re.sub(r"&zwnj;", "", mystring)

我希望这能帮助像我这样有问题的人。

小开

根据我在 Python 中的经验，这是由函数 Re.findall ()中使用的第二个参数中的 Nothing 值引起的。

import re
x = re.findall(r"\[(.*?)\]", None)

其中一个代码示例重现了这个错误。

为了避免此错误消息，可以过滤空值或添加一个条件，将其排除在处理之外