In [1]: s = bytearray('Hello World')
In [2]: s[:5] = 'Bye'
In [3]: s
Out[3]: bytearray(b'Bye World')
In [4]: str(s)
Out[4]: 'Bye World'
The appeal of using a bytearray is its memory-efficiency and convenient syntax. It can also be faster than using a temporary list:
In [36]: %timeit s = list('Hello World'*1000); s[5500:6000] = 'Bye'; s = ''.join(s)
1000 loops, best of 3: 256 µs per loop
In [37]: %timeit s = bytearray('Hello World'*1000); s[5500:6000] = 'Bye'; str(s)
100000 loops, best of 3: 2.39 µs per loop
Note that much of the difference in speed is attributable to the creation of the container:
In [32]: %timeit s = list('Hello World'*1000)
10000 loops, best of 3: 115 µs per loop
In [33]: %timeit s = bytearray('Hello World'*1000)
1000000 loops, best of 3: 1.13 µs per loop
Depends on what you want to do. If you want a mutable sequence, the builtin list type is your friend, and going from str to list and back is as simple as:
If you want to build a large string using a for loop, the pythonic way is usually to build a list of strings then join them together with the proper separator (linebreak or whatever).
Else you can also use some text template system, or a parser or whatever specialized tool is the most appropriate for the job.
Concatenating immutable sequences always results in a new object. This means that building up a sequence by repeated concatenation will have a quadratic runtime cost in the total sequence length. To get a linear runtime cost, you must switch to one of the alternatives below:
if concatenating str objects, you can build a list and use str.join() at the end or else write to an io.StringIO instance and retrieve its value when complete
Experiment to compare runtime of several options:
import sys
import timeit
from io import StringIO
from array import array
def test_concat():
out_str = ''
for _ in range(loop_count):
out_str += 'abc'
return out_str
def test_join_list_loop():
str_list = []
for _ in range(loop_count):
str_list.append('abc')
return ''.join(str_list)
def test_array():
char_array = array('b')
for _ in range(loop_count):
char_array.frombytes(b'abc')
return str(char_array.tostring())
def test_string_io():
file_str = StringIO()
for _ in range(loop_count):
file_str.write('abc')
return file_str.getvalue()
def test_join_list_compr():
return ''.join(['abc' for _ in range(loop_count)])
def test_join_gen_compr():
return ''.join('abc' for _ in range(loop_count))
loop_count = 80000
print(sys.version)
res = {}
for k, v in dict(globals()).items():
if k.startswith('test_'):
res[k] = timeit.timeit(v, number=10)
for k, v in sorted(res.items(), key=lambda x: x[1]):
print('{:.5f} {}'.format(v, k))
Efficient String Concatenation in Python is a rather old article and its main statement that the naive concatenation is far slower than joining is not valid anymore, because this part has been optimized in CPython since then. From the docs:
CPython implementation detail: If s and t are both strings, some Python implementations such as CPython can usually perform an in-place optimization for assignments of the form s = s + t or s += t. When applicable, this optimization makes quadratic run-time much less likely. This optimization is both version and implementation dependent. For performance sensitive code, it is preferable to use the str.join() method which assures consistent linear concatenation performance across versions and implementations.
I've adapted their code a bit and got the following results on my machine:
from cStringIO import StringIO
from UserString import MutableString
from array import array
import sys, timeit
def method1():
out_str = ''
for num in xrange(loop_count):
out_str += `num`
return out_str
def method2():
out_str = MutableString()
for num in xrange(loop_count):
out_str += `num`
return out_str
def method3():
char_array = array('c')
for num in xrange(loop_count):
char_array.fromstring(`num`)
return char_array.tostring()
def method4():
str_list = []
for num in xrange(loop_count):
str_list.append(`num`)
out_str = ''.join(str_list)
return out_str
def method5():
file_str = StringIO()
for num in xrange(loop_count):
file_str.write(`num`)
out_str = file_str.getvalue()
return out_str
def method6():
out_str = ''.join([`num` for num in xrange(loop_count)])
return out_str
def method7():
out_str = ''.join(`num` for num in xrange(loop_count))
return out_str
loop_count = 80000
print sys.version
print 'method1=', timeit.timeit(method1, number=10)
print 'method2=', timeit.timeit(method2, number=10)
print 'method3=', timeit.timeit(method3, number=10)
print 'method4=', timeit.timeit(method4, number=10)
print 'method5=', timeit.timeit(method5, number=10)
print 'method6=', timeit.timeit(method6, number=10)
print 'method7=', timeit.timeit(method7, number=10)
The previously provided answers are almost always best. However, sometimes the string is built up across many method calls and/or loops, so it's not necessarily natural to build up a list of lines and then join them. And since there's no guarantee you are using CPython, or that CPython's optimization will apply, an alternative approach is to just use print!
Here's an example helper class, although the helper class is trivial and probably unnecessary, it serves to illustrate the approach (Python 3):
Just a test I run on python 3.6.2 showing that "join" still win BIG!
from time import time
def _with_format(i):
_st = ''
for i in range(0, i):
_st = "{}{}".format(_st, "0")
return _st
def _with_s(i):
_st = ''
for i in range(0, i):
_st = "%s%s" % (_st, "0")
return _st
def _with_list(i):
l = []
for i in range(0, i):
l.append("0")
return "".join(l)
def _count_time(name, i, func):
start = time()
r = func(i)
total = time() - start
print("%s done in %ss" % (name, total))
return r
iterationCount = 1000000
r1 = _count_time("with format", iterationCount, _with_format)
r2 = _count_time("with s", iterationCount, _with_s)
r3 = _count_time("with list and join", iterationCount, _with_list)
if r1 != r2 or r2 != r3:
print("Not all results are the same!")
And the output was:
with format done in 17.991968870162964s
with s done in 18.36879801750183s
with list and join done in 0.12142801284790039s
I've added to Roee Gavirel's code 2 additional tests that show conclusively that joining lists into strings is not any faster than s += "something", up to Python 3.6. Later versions have different results.
Results:
Python 2.7.15rc1
Iterations: 100000
format done in 0.317540168762s
%s done in 0.151262044907s
list+join done in 0.0055148601532s
str cat done in 0.00391721725464s
Python 3.6.7
Iterations: 100000
format done in 0.35594654083251953s
%s done in 0.2868080139160156s
list+join done in 0.005924701690673828s
str cat done in 0.0054128170013427734s
f str done in 0.12870001792907715s
Python 3.8.5
Iterations: 100000
format done in 0.1859891414642334s
%s done in 0.17499303817749023s
list+join done in 0.008001089096069336s
str cat done in 0.014998912811279297s
f str done in 0.1600024700164795s
Code:
from time import time
def _with_cat(i):
_st = ''
for i in range(0, i):
_st += "0"
return _st
def _with_f_str(i):
_st = ''
for i in range(0, i):
_st = f"{_st}0"
return _st
def _with_format(i):
_st = ''
for i in range(0, i):
_st = "{}{}".format(_st, "0")
return _st
def _with_s(i):
_st = ''
for i in range(0, i):
_st = "%s%s" % (_st, "0")
return _st
def _with_list(i):
l = []
for i in range(0, i):
l.append("0")
return "".join(l)
def _count_time(name, i, func):
start = time()
r = func(i)
total = time() - start
print("%s done in %ss" % (name, total))
return r
iteration_count = 100000
print('Iterations: {}'.format(iteration_count))
r1 = _count_time("format ", iteration_count, _with_format)
r2 = _count_time("%s ", iteration_count, _with_s)
r3 = _count_time("list+join", iteration_count, _with_list)
r4 = _count_time("str cat ", iteration_count, _with_cat)
r5 = _count_time("f str ", iteration_count, _with_f_str)
if len(set([r1, r2, r3, r4, r5])) != 1:
print("Not all results are the same!")
In the top answer, the link from "Efficient String Concatenation in Python" no longer links to the intended page (redirects to tensorflow.org instead).
However, this page from 2004 with the exact code cited probably represents that page https://waymoot.org/home/python_string/ .
You may have seen it already since it comes up first if you google:
efficient python StringBuilder
I can't leave this in a comment, as I'm not privileged.
The closest thing Python offers to a mutable string or StringBuffer would probably be a Unicode-type array from the array standard library module. It can be useful in cases where you only want to edit small parts of the string:
modifications = [(2, 3, 'h'), (0, 6, '!')]
n_rows = multiline_string.count('\n')
strarray = array.array('u', multiline_string)
for row, column, character in modifications:
strarray[row * (n_rows + 1) + column] = character
multiline_string = map_strarray.tounicode()
class StringBuffer:
def __init__(self, s:str=None):
self._a=[] if s is None else [s]
def a(self, v):
self._a.append(str(v))
return self
def al(self, v):
self._a.append(str(v))
self._a.append('\n')
return self
def ts(self, delim=''):
return delim.join(self._a)
def __bool__(self): return True
Usage:
sb = StringBuffer('{')
for i, (k, v) in enumerate({'k1':'v1', 'k2': 'v2'}.items()):
if i > 0: sb.a(', ')
sb.a('"').a(k).a('": ').a('"').a(v)
sb.a('}')
print(sb.ts('\n'))