如何计算一个句子中的单词数,忽略数字、标点符号和空格?

我该如何计算句子中的单词呢? 我正在使用 Python。

例如,我可能有字符串:

string = "I     am having  a   very  nice  23!@$      day. "

那就是七个字。我对每个单词之后/之前的随机空格以及涉及到数字或符号时的空格数量感到困难。

230351 次浏览

没有任何参数的 str.split() 在运行空格字符时分裂:

>>> s = 'I am having a very nice day.'
>>>
>>> len(s.split())
7

来自链接文档:

如果没有指定 九月或者 None,则应用一种不同的分割算法: 连续空格的运行被视为一个单独的分隔符,如果字符串有前导空格或尾随空格,则结果在开始或结束时不包含空字符串。

你可以使用 regex.findall():

import re
line = " I am having a very nice day."
count = len(re.findall(r'\w+', line))
print (count)

好吧,这是我的做法。我注意到您希望输出为 7,这意味着您不希望计算特殊字符和数字。这里是正则表达式模式:

re.findall("[a-zA-Z_]+", string)

其中 [a-zA-Z_]表示它将匹配 a-z(小写)和 A-Z(大写)之间的 任何字符。


关于空格。如果你想删除所有额外的空格,只需要:

string = string.rstrip().lstrip() # Remove all extra spaces at the start and at the end of the string
while "  " in string: # While  there are 2 spaces beetwen words in our string...
string = string.replace("  ", " ") # ... replace them by one space!

这是一个使用正则表达式的简单单词计数器。该脚本包含一个循环,您可以在完成后终止它。

#word counter using regex
import re
while True:
string =raw_input("Enter the string: ")
count = len(re.findall("[a-zA-Z_]+", string))
if line == "Done": #command to terminate the loop
break
print (count)
print ("Terminated")
    def wordCount(mystring):
tempcount = 0
count = 1


try:
for character in mystring:
if character == " ":
tempcount +=1
if tempcount ==1:
count +=1


else:
tempcount +=1
else:
tempcount=0


return count


except Exception:
error = "Not a string"
return error


mystring = "I   am having   a    very nice 23!@$      day."


print(wordCount(mystring))

输出是8

s = "I     am having  a   very  nice  23!@$      day. "
sum([i.strip(string.punctuation).isalpha() for i in s.split()])

上面的语句将遍历每个文本块并删除标点符号,然后验证该块是否真的是字母串。

如何使用一个简单的循环来计算出现的空格数! ?

txt = "Just an example here move along"
count = 1
for i in txt:
if i == " ":
count += 1
print(count)

import string


sentence = "I     am having  a   very  nice  23!@$      day. "
# Remove all punctuations
sentence = sentence.translate(str.maketrans('', '', string.punctuation))
# Remove all numbers"
sentence = ''.join([word for word in sentence if not word.isdigit()])
count = 0;
for index in range(len(sentence)-1) :
if sentence[index+1].isspace() and not sentence[index].isspace():
count += 1
print(count)