小开

write to a temporary file which resides in RAM

事实证明，tempfile模块(http://docs.python.org/library/tempfile.html)恰好具备以下特点:

SpooledporaryFile ([ max _ size = 0[ , Mode = ‘ w + b’[ ，bufsize =-1[ ，后缀 =”[ , 前缀 = ‘ tmp’[ ，dir = Nothing ]]]])

这个函数完全按照除了那些数据之外，其他的都可以在内存中假脱机，直到文件 Size 超过 max _ size，或者直到调用 file 的 fileno ()方法写内容的点到磁盘，操作继续进行临时档案()。

生成的文件还有一个附加文件方法 rollover () ，这会导致转到磁盘上的文件不管它的大小。

返回的对象类似于文件 object whose _file attribute is either StringIO 对象或真正的文件对象，这取决于 Rollover ()已被调用类似文件的对象可以在 with 中使用语句，就像普通文件一样。

版本2.6中新增。

or if you're lazy and you have a tmpfs-mounted /tmp on Linux, you can just make a file there, but you have to delete it yourself and deal with naming

小开

最佳答案

我的建议是使用 StringIO对象。它们模拟文件，但是驻留在内存中。所以你可以这样做:

# get_zip_data() gets a zip archive containing 'foo.txt', reading 'hey, foo'


import zipfile
from StringIO import StringIO


zipdata = StringIO()
zipdata.write(get_zip_data())
myzipfile = zipfile.ZipFile(zipdata)
foofile = myzipfile.open('foo.txt')
print foofile.read()


# output: "hey, foo"

或者更简单一点(向 Vishal 道歉) :

myzipfile = zipfile.ZipFile(StringIO(get_zip_data()))
for name in myzipfile.namelist():
[ ... ]

在 Python 3中，使用 BytesIO 代替 StringIO:

import zipfile
from io import BytesIO


filebytes = BytesIO(get_zip_data())
myzipfile = zipfile.ZipFile(filebytes)
for name in myzipfile.namelist():
[ ... ]

小开

下面是我用来获取压缩的 csv 文件的代码片段，请看一下:

Python 2 :

from StringIO import StringIO
from zipfile import ZipFile
from urllib import urlopen


resp = urlopen("http://www.test.com/file.zip")
myzip = ZipFile(StringIO(resp.read()))
for line in myzip.open(file).readlines():
print line

Python 3 :

from io import BytesIO
from zipfile import ZipFile
from urllib.request import urlopen
# or: requests.get(url).content


resp = urlopen("http://www.test.com/file.zip")
myzip = ZipFile(BytesIO(resp.read()))
for line in myzip.open(file).readlines():
print(line.decode('utf-8'))

这里的 file是一个字符串。要获取要传递的实际字符串，可以使用 zipfile.namelist()。比如说,

resp = urlopen('http://mlg.ucd.ie/files/datasets/bbc.zip')
myzip = ZipFile(BytesIO(resp.read()))
myzip.namelist()
# ['bbc.classes', 'bbc.docs', 'bbc.mtx', 'bbc.terms']

小开

在 Vishal 的回答中，在磁盘上没有文件的情况下，文件名应该是什么并不明显。我已经根据大多数需要修改了他的答案。

from StringIO import StringIO
from zipfile import ZipFile
from urllib import urlopen


def unzip_string(zipped_string):
unzipped_string = ''
zipfile = ZipFile(StringIO(zipped_string))
for name in zipfile.namelist():
unzipped_string += zipfile.open(name).read()
return unzipped_string

小开

为了完整起见，我想加上我的 Python 3答案:

from io import BytesIO
from zipfile import ZipFile
import requests


def get_zip(file_url):
url = requests.get(file_url)
zipfile = ZipFile(BytesIO(url.content))
files = [zipfile.open(file_name) for file_name in zipfile.namelist()]
return files.pop() if len(files) == 1 else files

小开

我想提供一个更新的 Python 3版本的 Vishal 的优秀答案，它使用了 Python 2，并附带一些关于适应/更改的解释，这些可能已经提到过了。

from io import BytesIO
from zipfile import ZipFile
import urllib.request
    

url = urllib.request.urlopen("http://www.unece.org/fileadmin/DAM/cefact/locode/loc162txt.zip")


with ZipFile(BytesIO(url.read())) as my_zip_file:
for contained_file in my_zip_file.namelist():
# with open(("unzipped_and_read_" + contained_file + ".file"), "wb") as output:
for line in my_zip_file.open(contained_file).readlines():
print(line)
# output.write(line)

Necessary changes:

在 Python3中没有 StringIO模块(它已经移到了 io.StringIO)。相反，我使用 io.BytesIO] 2，因为我们将处理一个 bytestream —— Docs，也是这根线。
返回文章页面
- Python 2.6及更早版本的遗留 urllib.urlopen函数已经停用; urllib.request.urlopen()对应于旧的 urllib2.urlopen医生和这根线。

注:

在 Python3中，打印的输出行如下所示: b'some text'。这是意料之中的，因为它们不是字符串——记住，我们读取的是一个字节流。看看 Dan04回答得很好。

我做了一些小改动:

根据文件，我使用 with ... as而不是 zipfile = ...。
脚本现在使用 .namelist()循环遍历 zip 中的所有文件并打印其内容。
我将 ZipFile对象的创建移到了 with语句中，尽管我不确定这样是否更好。
为了响应 NumenorForLife 的注释，我添加(并注释掉)了一个选项，将 bytestream 写入文件(压缩文件中的每个文件) ; 它将 "unzipped_and_read_"添加到文件名的开头和一个 ".file"扩展名(对于带有字节串的文件，我不喜欢使用 ".txt")。当然，如果您想要使用代码，则需要调整代码的缩进。
- Need to be careful here -- because we have a byte string, we use binary mode, so "wb"; I have a feeling that writing binary opens a can of worms anyway...
我使用的是一个示例文件 UN/LOCODE text archive:

我没有做的是:

NumenorForLife 询问如何将 zip 保存到磁盘。我不知道他是什么意思，下载压缩文件吗？这是一个不同的任务; 请参见 Oleh Prypin 的绝妙回答。

有个办法:

import urllib.request
import shutil


with urllib.request.urlopen("http://www.unece.org/fileadmin/DAM/cefact/locode/2015-2_UNLOCODE_SecretariatNotes.pdf") as response, open("downloaded_file.pdf", 'w') as out_file:
shutil.copyfileobj(response, out_file)

小开

Vishal 的例子，无论多么伟大，在涉及到文件名时都会混淆，我不认为重新定义“ zipfile”有什么好处。

下面是我的例子，下载一个包含一些文件的 zip，其中一个是 csv 文件，我随后将其读入熊猫 DataFrame:

from StringIO import StringIO
from zipfile import ZipFile
from urllib import urlopen
import pandas


url = urlopen("https://www.federalreserve.gov/apps/mdrm/pdf/MDRM.zip")
zf = ZipFile(StringIO(url.read()))
for item in zf.namelist():
print("File in zip: "+  item)
# find the first matching csv file in the zip:
match = [s for s in zf.namelist() if ".csv" in s][0]
# the first line of the file contains a string - that line shall de ignored, hence skiprows
df = pandas.read_csv(zf.open(match), low_memory=False, skiprows=[0])

(注意，我使用的是 Python 2.7.13)

这正是对我有效的解决办法。我只是针对 Python3版本稍微调整了一下，删除了 StringIO 并添加了 IO 库

Python 3 Version Python 3版本

from io import BytesIO
from zipfile import ZipFile
import pandas
import requests


url = "https://www.nseindia.com/content/indices/mcwb_jun19.zip"
content = requests.get(url)
zf = ZipFile(BytesIO(content.content))


for item in zf.namelist():
print("File in zip: "+  item)


# find the first matching csv file in the zip:
match = [s for s in zf.namelist() if ".csv" in s][0]
# the first line of the file contains a string - that line shall de     ignored, hence skiprows
df = pandas.read_csv(zf.open(match), low_memory=False, skiprows=[0])

小开

使用请求对其他答案进行补充:

 # download from web


import requests
url = 'http://mlg.ucd.ie/files/datasets/bbc.zip'
content = requests.get(url)


# unzip the content
from io import BytesIO
from zipfile import ZipFile
f = ZipFile(BytesIO(content.content))
print(f.namelist())


# outputs ['bbc.classes', 'bbc.docs', 'bbc.mtx', 'bbc.terms']

使用 帮助(f)获取更多的函数细节，例如 [拉丁语]，它提取 zip 文件中的内容，以后可以与打开一起使用。

小开

使用 zipfile模块。要从 URL 中提取文件，需要将 urlopen调用的结果封装在 BytesIO对象中。这是因为 urlopen返回的 Web 请求的结果不支持查找:

from urllib.request import urlopen


from io import BytesIO
from zipfile import ZipFile


zip_url = 'http://example.com/my_file.zip'


with urlopen(zip_url) as f:
with BytesIO(f.read()) as b, ZipFile(b) as myzipfile:
foofile = myzipfile.open('foo.txt')
print(foofile.read())

如果你已经在本地下载了文件，你不需要 BytesIO，只需要打开它在二进制模式和传递到 ZipFile直接:

from zipfile import ZipFile


zip_filename = 'my_file.zip'


with open(zip_filename, 'rb') as f:
with ZipFile(f) as myzipfile:
foofile = myzipfile.open('foo.txt')
print(foofile.read().decode('utf-8'))

同样，请注意您必须将二进制('rb')模式中的文件 open，而不是作为文本，否则您将得到一个 zipfile.BadZipFile: File is not a zip file错误。

将所有这些内容用作 with语句的上下文管理器是一个很好的实践，这样就可以正确地关闭它们。

小开

All of these answers appear too bulky and long. Use requests to shorten the code, e.g.:

import requests, zipfile, io
r = requests.get(zip_file_url)
z = zipfile.ZipFile(io.BytesIO(r.content))
z.extractall("/path/to/directory")

下载和解压一个. zip 文件而不写到磁盘

Python 3 Version Python 3版本