如何修剪空白？

小开

对于前导和尾随空格：

s = '   foo    \t   'print s.strip() # prints "foo"

否则，正则表达式工作：

import repat = re.compile(r'\s+')s = '  \t  foo   \t   bar \t  'print pat.sub('', s) # prints "foobar"

小开

最佳答案

对于两边的空格，使用#0：

s = "  \t a string example\t  "s = s.strip()

对于右侧的空格，使用#0：

s = s.rstrip()

对于左侧的空格，使用#0：

s = s.lstrip()

您可以提供一个参数来将任意字符剥离到这些函数中的任何一个，如下所示：

s = s.strip(' \t\n\r')

这将删除字符串两侧的任何空格、\t、\n或\r字符。

上面的示例仅从字符串的左侧和右侧删除字符串。如果您还想从字符串中间删除字符，请尝试#0：

import reprint(re.sub('[\s+]', '', s))

这应该打印出来：

astringexample

小开

#how to trim a multi line string or a file
s=""" line one\tline two\tline three """
#line1 starts with a space, #2 starts and ends with a tab, #3 ends with a space.
s1=s.splitlines()print s1[' line one', '\tline two\t', 'line three ']
print [i.strip() for i in s1]['line one', 'line two', 'line three']



#more details:
#we could also have used a forloop from the begining:for line in s.splitlines():line=line.strip()process(line)
#we could also be reading a file line by line.. e.g. my_file=open(filename), or with open(filename) as myfile:for line in my_file:line=line.strip()process(line)
#moot point: note splitlines() removed the newline characters, we can keep them by passing True:#although split() will then remove them anyway..s2=s.splitlines(True)print s2[' line one\n', '\tline two\t\n', 'line three ']

小开

在Python<强>修剪中，方法被命名为strip：

str.strip()  # trimstr.lstrip()  # left trimstr.rstrip()  # right trim

小开

还没有人发布这些regex解决方案。

匹配：

>>> import re>>> p=re.compile('\\s*(.*\\S)?\\s*')
>>> m=p.match('  \t blah ')>>> m.group(1)'blah'
>>> m=p.match('  \tbl ah  \t ')>>> m.group(1)'bl ah'
>>> m=p.match('  \t  ')>>> print m.group(1)None

搜索（您必须以不同的方式处理“仅空格”输入情况）：

>>> p1=re.compile('\\S.*\\S')
>>> m=p1.search('  \tblah  \t ')>>> m.group()'blah'
>>> m=p1.search('  \tbl ah  \t ')>>> m.group()'bl ah'
>>> m=p1.search('  \t  ')>>> m.group()Traceback (most recent call last):File "<stdin>", line 1, in <module>AttributeError: 'NoneType' object has no attribute 'group'

如果您使用re.sub，您可能会删除内部空格，这可能是不可取的。

小开

您还可以使用非常简单和基本的函数：str.replace（），适用于空格和制表符：

>>> whitespaces = "   abcd ef gh ijkl       ">>> tabs = "        abcde       fgh        ijkl"
>>> print whitespaces.replace(" ", "")abcdefghijkl>>> print tabs.replace(" ", "")abcdefghijkl

简单易行。

小开

尝试翻译

>>> import string>>> print '\t\r\n  hello \r\n world \t\r\n'
helloworld>>> tr = string.maketrans(string.whitespace, ' '*len(string.whitespace))>>> '\t\r\n  hello \r\n world \t\r\n'.translate(tr)'     hello    world    '>>> '\t\r\n  hello \r\n world \t\r\n'.translate(tr).replace(' ', '')'helloworld'

小开

    something = "\t  please_     \t remove_  all_    \n\n\n\nwhitespaces\n\t  "
something = "".join(something.split())

输出：

please_remove_all_whitespaces

将Le Droid的评论添加到答案中。用空格分隔：

    something = "\t  please     \t remove  all   extra \n\n\n\nwhitespaces\n\t  "something = " ".join(something.split())

输出：

请删除所有额外的空格

小开

一般来说，我使用以下方法：

>>> myStr = "Hi\n Stack Over \r flow!">>> charList = [u"\u005Cn",u"\u005Cr",u"\u005Ct"]>>> import re>>> for i in charList:myStr = re.sub(i, r"", myStr)
>>> myStr'Hi Stack Over  flow'

注意：这仅用于删除“\n”、“\r”和“\t”。它不会删除额外的空格。

小开

空格包含空间、制表符和CRLF。所以我们可以使用的优雅的单行字符串函数是翻译。

' hello apple'.translate(None, ' \n\t\r')

或如果你想彻底

import string' hello  apple'.translate(None, string.whitespace)

小开

这将删除字符串开头和结尾的所有空格和换行符：

>>> s = "  \n\t  \n   some \n text \n     ">>> re.sub("^\s+|\s+$", "", s)>>> "some \n text"

小开

（re.sub (' +', ' ',( my_str.replace（'\n',' ')))).条（）

这将删除所有不需要的空格和换行符。希望这有帮助

import remy_str = '   a     b \n c   'formatted_str = (re.sub(' +', ' ',(my_str.replace('\n',' ')))).strip()

这将导致：

'a b\n c'将改为a b c

小开

如果使用Python 3：在您的print语句中，以sep=""结束。这将分隔所有空格。

示例：

txt="potatoes"print("I love ",txt,"",sep="")

这将打印：我喜欢土豆！

而不是：我喜欢土豆！

在你的情况下，因为你会试图得到骑\t，做sep="\t"

小开

如果你想只在字符串的开头和结尾修剪空格，你可以这样做：

some_string = "    Hello,    world!\n    "new_string = some_string.strip()# new_string is now "Hello,    world!"

这很像Qt的QString::trimmed（）方法，因为它删除了前导和尾随空格，而只保留内部空格。

但是，如果你想使用类似Qt的QString::简化（）方法，它不仅可以删除前导和尾随空格，还可以将所有连续的内部空格“压扁”为一个空格字符，你可以使用.split()和" ".join的组合，如下所示：

some_string = "\t    Hello,  \n\t  world!\n    "new_string = " ".join(some_string.split())# new_string is now "Hello, world!"

在最后一个示例中，每个内部空格序列都替换为单个空格，同时仍然修剪字符串开头和结尾的空格。

小开

在这里查看了许多具有不同程度理解的解决方案后，我想知道如果字符串以逗号分隔该怎么办……

的问题

在尝试处理联系人信息的csv时，我需要解决这个问题：修剪无关的空格和一些垃圾，但保留尾随逗号和内部空格。使用包含联系人注释的字段，我想删除垃圾，留下好东西。修剪所有标点符号和谷壳，我不想丢失复合标记之间的空格，因为我不想稍后重建。

正则表达式和模式：`[\s_]+?\W+`

该模式查找任何空白字符和下划线（'_'）的单个实例，延迟地从1到无限次（尽可能少的字符），其中[\s_]+?在非单词字符之前出现，从1到无限次：\W+（相当于[^a-zA-Z0-9_]）。具体来说，这会查找空白区域：空字符（\0）、制表符（\t）、换行符（\n）、前馈（\f）、回车符（\r）。

我认为这样做的好处有两个：

它不会删除您可能希望保留在一起的完整单词/标记之间的空格；

Python的内置字符串方法strip()不处理字符串内部，只处理左右两端，默认arg为空字符（参见下面的示例：文本中有几个换行符，strip()不会删除它们，而regex模式会删除它们）。text.strip(' \n\t\r')

这超出了OPs的问题，但我认为在很多情况下，我们可能会在文本数据中出现奇怪的病态实例，就像我所做的那样（一些转义字符是如何在某些文本中结束的）。此外，在类似列表的字符串中，我们不想消除分隔符，除非分隔符分隔两个空格字符或一些非单词字符，例如'-，'或'-，，，'。

注意：不是谈论CSV本身的分隔符。仅用于CSV中数据类似列表的实例，即c. s.子字符串字符串。

充分披露：我只操作了大约一个月的文本，而regex只是在过去的两周里，所以我确信我错过了一些细微差别。也就是说，对于较小的字符串集合（我的在12,000行和40多列的数据框中），作为删除无关字符后的最后一步，这工作得非常好，特别是如果你引入了一些额外的空格，你想分隔由非单词字符连接的文本，但不想在以前没有空格的地方添加空格。

举个例子：

import re

text = "\"portfolio, derp, hello-world, hello-, -world, founders, mentors, :, ?, %, ,>, , ffib, biff, 1, 12.18.02, 12,  2013, 9874890288, .., ..., ...., , ff, series a, exit, general mailing, fr, , , ,, co founder, pitch_at_palace, ba, _slkdjfl_bf, sdf_jlk, )_(, jim.somedude@blahblah.com, ,dd invites,subscribed,, master, , , ,  dd invites,subscribed, , , , \r, , \0, ff dd \n invites, subscribed, , ,  , , alumni spring 2012 deck: https: www.dropbox.com s, \n i69rpofhfsp9t7c practice 20ignition - 20june \t\n .2134.pdf 2109                                                 \n\n\n\nklkjsdf\""
print(f"Here is the text as formatted:\n{text}\n")print()print("Trimming both the whitespaces and the non-word characters that follow them.")print()trim_ws_punctn = re.compile(r'[\s_]+?\W+')clean_text = trim_ws_punctn.sub(' ', text)print(clean_text)print()print("what about 'strip()'?")print(f"Here is the text, formatted as is:\n{text}\n")clean_text = text.strip(' \n\t\r')  # strip out whitespace?print()print(f"Here is the text, formatted as is:\n{clean_text}\n")
print()print("Are 'text' and 'clean_text' unchanged?")print(clean_text == text)

此输出：

Here is the text as formatted:
"portfolio, derp, hello-world, hello-, -world, founders, mentors, :, ?, %, ,>, , ffib, biff, 1, 12.18.02, 12,  2013, 9874890288, .., ..., ...., , ff, series a, exit, general mailing, fr, , , ,, co founder, pitch_at_palace, ba, _slkdjfl_bf, sdf_jlk, )_(, jim.somedude@blahblah.com, ,dd invites,subscribed,, master, , , ,  dd invites,subscribed, ,, , , ff ddinvites, subscribed, , ,  , , alumni spring 2012 deck: https: www.dropbox.com s,i69rpofhfsp9t7c practice 20ignition - 20june.2134.pdf 2109


klkjsdf"
using regex to trim both the whitespaces and the non-word characters that follow them.
"portfolio, derp, hello-world, hello-, world, founders, mentors, ffib, biff, 1, 12.18.02, 12, 2013, 9874890288, ff, series a, exit, general mailing, fr, co founder, pitch_at_palace, ba, _slkdjfl_bf, sdf_jlk,  jim.somedude@blahblah.com, dd invites,subscribed,, master, dd invites,subscribed, ff dd invites, subscribed, alumni spring 2012 deck: https: www.dropbox.com s, i69rpofhfsp9t7c practice 20ignition 20june 2134.pdf 2109 klkjsdf"
Very nice.What about 'strip()'?
Here is the text, formatted as is:
"portfolio, derp, hello-world, hello-, -world, founders, mentors, :, ?, %, ,>, , ffib, biff, 1, 12.18.02, 12,  2013, 9874890288, .., ..., ...., , ff, series a, exit, general mailing, fr, , , ,, co founder, pitch_at_palace, ba, _slkdjfl_bf, sdf_jlk, )_(, jim.somedude@blahblah.com, ,dd invites,subscribed,, master, , , ,  dd invites,subscribed, ,, , , ff ddinvites, subscribed, , ,  , , alumni spring 2012 deck: https: www.dropbox.com s,i69rpofhfsp9t7c practice 20ignition - 20june.2134.pdf 2109


klkjsdf"

Here is the text, after stipping with 'strip':

"portfolio, derp, hello-world, hello-, -world, founders, mentors, :, ?, %, ,>, , ffib, biff, 1, 12.18.02, 12,  2013, 9874890288, .., ..., ...., , ff, series a, exit, general mailing, fr, , , ,, co founder, pitch_at_palace, ba, _slkdjfl_bf, sdf_jlk, )_(, jim.somedude@blahblah.com, ,dd invites,subscribed,, master, , , ,  dd invites,subscribed, ,, , , ff ddinvites, subscribed, , ,  , , alumni spring 2012 deck: https: www.dropbox.com s,i69rpofhfsp9t7c practice 20ignition - 20june.2134.pdf 2109


klkjsdf"Are 'text' and 'clean_text' unchanged? 'True'

所以条带每次删除一个空格。所以在OP的情况下，strip()很好。但是如果事情变得更复杂，正则表达式和类似的模式可能对更常规的设置有一些价值。

看到它在行动

的问题

正则表达式和模式：[\s_]+?\W+

正则表达式和模式：`[\s_]+?\W+`