Difference between `open` and `io.BytesIO` in binary streams

小开

最佳答案

为了简单起见，现在让我们考虑写作而不是阅读。

所以当你使用 open()的时候，比如说:

with open("test.dat", "wb") as f:
f.write(b"Hello World")
f.write(b"Hello World")
f.write(b"Hello World")

执行该命令后，将创建一个名为 test.dat的文件，其中包含3 x Hello World。数据写入文件后不会保存在内存中(除非使用名称保存)。

现在考虑 io.BytesIO():

with io.BytesIO() as f:
f.write(b"Hello World")
f.write(b"Hello World")
f.write(b"Hello World")

它不是将内容写入文件，而是写入内存缓冲区。换句话说，一大块 RAM。从本质上讲，写下这些内容就相当于:

buffer = b""
buffer += b"Hello World"
buffer += b"Hello World"
buffer += b"Hello World"

In relation to the example with the with statement, then at the end there would also be a del buffer.

The key difference here is optimization and performance. io.BytesIO is able to do some optimizations that makes it faster than simply concatenating all the b"Hello World" one by one.

Just to prove it here's a small benchmark:

1.3529秒
BytesIO: 0.0090秒

import io
import time


begin = time.time()
buffer = b""
for i in range(0, 50000):
buffer += b"Hello World"
end = time.time()
seconds = end - begin
print("Concat:", seconds)


begin = time.time()
buffer = io.BytesIO()
for i in range(0, 50000):
buffer.write(b"Hello World")
end = time.time()
seconds = end - begin
print("BytesIO:", seconds)

除了提高性能之外，使用 BytesIO代替连接还有一个优点，那就是可以使用 BytesIO代替文件对象。假设您有一个函数，它期望向其中写入一个文件对象。然后您可以给它一个内存缓冲区，而不是一个文件。

区别在于，open("myfile.jpg", "rb")只是加载并返回 myfile.jpg的内容; 而 BytesIO同样只是一个包含一些数据的缓冲区。

因为 BytesIO只是一个缓冲区——如果你想以后把内容写到一个文件中——你必须这样做:

buffer = io.BytesIO()
# ...
with open("test.dat", "wb") as f:
f.write(buffer.getvalue())

另外，您没有提到版本; 我使用的是 Python3。与示例相关: 我使用的是 with 语句，而不是调用 f.close()