迭代字符串的时间复杂度实际上是附加 O (n ^ 2)还是 O (n) ?

我正在处理 CTCI 的一个问题。

第一章的第三个问题是取一个字符串,比如

'Mr John Smith '

并要求您将中间空格替换为 %20:

'Mr%20John%20Smith'

作者在 Python 中提供了这种解决方案,称之为 O (n) :

def urlify(string, length):
'''function replaces single spaces with %20 and removes trailing spaces'''
counter = 0
output = ''
for char in string:
counter += 1
if counter > length:
return output
elif char == ' ':
output = output + '%20'
elif char != ' ':
output = output + char
return output

我的问题是:

我知道这是从左到右扫描实际字符串的 O (n)。但是 Python 中的字符串不是一成不变的吗?如果我有一个字符串,并且我用 +操作符将另一个字符串添加到它,它是否分配了必要的空间,复制原始字符串,然后复制附加的字符串?

如果我有一个长度为1的 n字符串集合,那么需要:

1 + 2 + 3 + 4 + 5 + ... + n = n(n+1)/2

或者 O (n ^ 2)次,是吗? 或者我错误地理解了 Python 处理附加的方式?

或者,如果你愿意教我如何钓鱼: 我将如何自己去发现这一点?我在谷歌上搜索官方信息来源的尝试失败了。我找到了 https://wiki.python.org/moin/TimeComplexity但是这里没有任何关于字符串的东西。

11810 次浏览

In CPython, the standard implementation of Python, there's an implementation detail that makes this usually O(n), implemented in the code the bytecode evaluation loop calls for ABC0 or += with two string operands. If Python detects that the left argument has no other references, it calls realloc to attempt to avoid a copy by resizing the string in place. This is not something you should ever rely on, because it's an implementation detail and because if realloc ends up needing to move the string frequently, performance degrades to O(n^2) anyway.

Without the weird implementation detail, the algorithm is O(n^2) due to the quadratic amount of copying involved. Code like this would only make sense in a language with mutable strings, like C++, and even in C++ you'd want to use +=.

I found this snippet of text on Python Speed > Use the best algorithms and fastest tools:

String concatenation is best done with ''.join(seq) which is an O(n) process. In contrast, using the '+' or '+=' operators can result in an O(n^2) process because new strings may be built for each intermediate step. The CPython 2.4 interpreter mitigates this issue somewhat; however, ''.join(seq) remains the best practice

The author relies on an optimization that happens to be here, but is not explicitly dependable. strA = strB + strC is typically O(n), making the function O(n^2). However, it is pretty easy to make sure it the whole process is O(n), use an array:

output = []
# ... loop thing
output.append('%20')
# ...
output.append(char)
# ...
return ''.join(output)

In a nutshell, the append operation is amortized O(1), (although you can make it strong O(1) by pre-allocating the array to the right size), making the loop O(n).

And then the join is also O(n), but that's okay because it is outside the loop.

For future visitors: Since it is a CTCI question, any reference to learning urllib package is not required here, specifically as per OP and the book, this question is about Arrays and Strings.

Here's a more complete solution, inspired from @njzk2's pseudo:

text = 'Mr John Smith'#13
special_str = '%20'
def URLify(text, text_len, special_str):
url = []
for i in range(text_len): # O(n)
if text[i] == ' ': # n-s
url.append(special_str) # append() is O(1)
else:
url.append(text[i]) # O(1)


print(url)
return ''.join(url) #O(n)




print(URLify(text, 13, '%20'))