如何删除 Python 三引号多行字符串的额外缩进?

我有一个 Python 编辑器,用户在其中输入一个脚本或代码,然后将其放入幕后的一个主方法中,同时每一行都缩进。问题是,如果用户有一个多行字符串,那么对整个脚本所做的缩进通过在每个空格中插入一个选项卡来影响字符串。问题脚本可以很简单:

"""foo
bar
foo2"""

因此,当在 main 方法中,它看起来像:

def main():
"""foo
bar
foo2"""

字符串现在在每行的开头都有一个额外的标签。

44153 次浏览

The only way i see - is to strip first n tabs for each line starting with second, where n is known identation of main method.

If that identation is not known beforehand - you can add trailing newline before inserting it and strip number of tabs from the last line...

The third solution is to parse data and find beginning of multiline quote and do not add your identation to every line after until it will be closed.

Think there is a better solution..

What follows the first line of a multiline string is part of the string, and not treated as indentation by the parser. You may freely write:

def main():
"""foo
bar
foo2"""
pass

and it will do the right thing.

On the other hand, that's not readable, and Python knows it. So if a docstring contains whitespace in it's second line, that amount of whitespace is stripped off when you use help() to view the docstring. Thus, help(main) and the below help(main2) produce the same help info.

def main2():
"""foo
bar
foo2"""
pass

So if I get it correctly, you take whatever the user inputs, indent it properly and add it to the rest of your program (and then run that whole program).

So after you put the user input into your program, you could run a regex, that basically takes that forced indentation back. Something like: Within three quotes, replace all "new line markers" followed by four spaces (or a tab) with only a "new line marker".

textwrap.dedent from the standard library is there to automatically undo the wacky indentation.

From what I see, a better answer here might be inspect.cleandoc, which does much of what textwrap.dedent does but also fixes the problems that textwrap.dedent has with the leading line.

The below example shows the differences:

>>> import textwrap
>>> import inspect
>>> x = """foo bar
baz
foobar
foobaz
"""
>>> inspect.cleandoc(x)
'foo bar\nbaz\nfoobar\nfoobaz'
>>> textwrap.dedent(x)
'foo bar\n    baz\n    foobar\n    foobaz\n'
>>> y = """
...     foo
...     bar
... """
>>> inspect.cleandoc(y)
'foo\nbar'
>>> textwrap.dedent(y)
'\nfoo\nbar\n'
>>> z = """\tfoo
bar\tbaz
"""
>>> inspect.cleandoc(z)
'foo\nbar     baz'
>>> textwrap.dedent(z)
'\tfoo\nbar\tbaz\n'


Note that inspect.cleandoc also expands internal tabs to spaces. This may be inappropriate for one's use case, but works fine for me.

Showing the difference between textwrap.dedent and inspect.cleandoc with a little more clarity:

Behavior with the leading part not indented

import textwrap
import inspect


string1="""String
with
no indentation
"""
string2="""String
with
indentation
"""
print('string1 plain=' + repr(string1))
print('string1 inspect.cleandoc=' + repr(inspect.cleandoc(string1)))
print('string1 texwrap.dedent=' + repr(textwrap.dedent(string1)))
print('string2 plain=' + repr(string2))
print('string2 inspect.cleandoc=' + repr(inspect.cleandoc(string2)))
print('string2 texwrap.dedent=' + repr(textwrap.dedent(string2)))

Output

string1 plain='String\nwith\nno indentation\n       '
string1 inspect.cleandoc='String\nwith\nno indentation\n       '
string1 texwrap.dedent='String\nwith\nno indentation\n'
string2 plain='String\n        with\n        indentation\n       '
string2 inspect.cleandoc='String\nwith\nindentation'
string2 texwrap.dedent='String\n        with\n        indentation\n'

Behavior with the leading part indented

string1="""
String
with
no indentation
"""
string2="""
String
with
indentation
"""


print('string1 plain=' + repr(string1))
print('string1 inspect.cleandoc=' + repr(inspect.cleandoc(string1)))
print('string1 texwrap.dedent=' + repr(textwrap.dedent(string1)))
print('string2 plain=' + repr(string2))
print('string2 inspect.cleandoc=' + repr(inspect.cleandoc(string2)))
print('string2 texwrap.dedent=' + repr(textwrap.dedent(string2)))

Output

string1 plain='\nString\nwith\nno indentation\n       '
string1 inspect.cleandoc='String\nwith\nno indentation\n       '
string1 texwrap.dedent='\nString\nwith\nno indentation\n'
string2 plain='\n        String\n        with\n        indentation\n       '
string2 inspect.cleandoc='String\nwith\nindentation'
string2 texwrap.dedent='\nString\nwith\nindentation\n'

I wanted to preserve exactly what is between the triple-quote lines, removing common leading indent only. I found that texwrap.dedent and inspect.cleandoc didn't do it quite right, so I wrote this one. It uses os.path.commonprefix.

import re
from os.path import commonprefix


def ql(s, eol=True):
lines = s.splitlines()
l0 = None
if lines:
l0 = lines.pop(0) or None
common = commonprefix(lines)
indent = re.match(r'\s*', common)[0]
n = len(indent)
lines2 = [l[n:] for l in lines]
if not eol and lines2 and not lines2[-1]:
lines2.pop()
if l0 is not None:
lines2.insert(0, l0)
s2 = "\n".join(lines2)
return s2

This can quote any string with any indent. I wanted it to include the trailing newline by default, but with an option to remove it so that it can quote any string neatly.

Example:

print(ql("""
Hello
|\---/|
| o_o |
\_^_/
"""))


print(ql("""
World
|\---/|
| o_o |
\_^_/
"""))

The second string has 4 spaces of common indentation because the final """ is indented less than the quoted text:

 Hello
|\---/|
| o_o |
\_^_/


World
|\---/|
| o_o |
\_^_/

I thought this was going to be simpler, otherwise I wouldn't have bothered with it!

I had a similar issue: I wanted my triple quoted string to be indented, but I didn't want the string to have all those spaces at the beginning of each line. I used re to correct my issue:

        print(re.sub('\n *','\n', f"""Content-Type: multipart/mixed; boundary="===============9004758485092194316=="
`           MIME-Version: 1.0
Subject: Get the reader's attention here!
To: recipient@email.com


--===============9004758485092194316==
Content-Type: text/html; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit


Very important message goes here - you can even use <b>HTML</b>.
--===============9004758485092194316==--
"""))

Above, I was able to keep my code indented, but the string was left trimmed essentially. All spaces at the beginning of each line were deleted. This was important since any spaces in front of the SMTP or MIME specific lines would break the email message.

The tradeoff I made was that I left the Content-Type on the first line because the regex I was using didn't remove the initial \n (which broke email). If it bothered me enough, I guess I could have added an lstrip like this:

print(re.sub('\n *','\n', f"""
Content-Type: ...
""").lstrip()

After reading this 10 year old page, I decided to stick with re.sub since I didn't truly understand all the nuances of textwrap and inspect.

There is a much simpler way:

    foo = """first line\
\nsecond line"""

This does the trick, if I understand the question correctly. lstrip() removes leading whitespace, so it will remove tabs as well as spaces.

from os import linesep


def dedent(message):
return linesep.join(line.lstrip() for line in message.splitlines())

Example:

name='host'
config_file='/Users/nmellor/code/cold_fusion/end-to-end/config/stage.toml'
message = f"""Missing env var or configuration entry for 'host'.
Please add '{name}' entry to file
{config_file}
or export environment variable 'mqtt_{name}' before
running the program.
"""


>>> print(message)
Missing env var or configuration entry for 'host'.
Please add 'host' entry to
'/Users/nmellor/code/cold_fusion/end-to-end/config/stage.toml'
or export environment variable 'mqtt_host' before
running the program.


>>> print(dedent(message))
Missing env var or configuration entry for 'host'.
Please add 'host' entry to file
'/Users/nmellor/code/cold_fusion/end-to-end/config/stage.toml'
or export environment variable 'mqtt_host' before
running the program.


The above solution will remove ALL indentation. If you want to remove indentation that is common to the whole multiline string, use textwrap.dedent(). But take care that the first and last lines in the multi-line string are also indented otherwise .dedent() will do nothing.