多个 GZip 文件的快速串联

我有 gzip 文件的列表:

file1.gz
file2.gz
file3.gz

是否有方法将这些文件连接或 gzip 压缩到一个 gzip 文件中 而不需要减压?

在实践中,我们将使用这在网络数据库(CGI) 来自用户的查询,列出所有基于该查询的文件并显示它们 在批处理文件中返回给用户。

100002 次浏览

With gzip files, you can simply concatenate the files together, like so:

cat file1.gz file2.gz file3.gz > allfiles.gz

Per the gzip RFC,

A gzip file consists of a series of "members" (compressed data sets). [...] The members simply appear one after another in the file, with no additional information before, between, or after them.

Note that this is not exactly the same as building a single gzip file of the concatenated data; among other things, all of the original filenames are preserved. However, gunzip seems to handle it as equivalent to a concatenation.

Since existing tools generally ignore the filename headers for the additional members, it's not easily possible to extract individual files from the result. If you want this to be possible, build a ZIP file instead. ZIP and GZIP both use the DEFLATE algorithm for the actual compression (ZIP supports some other compression algorithms as well as an option - method 8 is the one that corresponds to GZIP's compression); the difference is in the metadata format. Since the metadata is uncompressed, it's simple enough to strip off the gzip headers and tack on ZIP file headers and a central directory record instead. Refer to the gzip format specification and the ZIP format specification.

You can create a tar file of these files and then gzip the tar file to create the new gzip file

tar -cvf newcombined.tar file1.gz file2.gz file3.gz
gzip newcombined.tar

Here is what man 1 gzip says about your requirement.

Multiple compressed files can be concatenated. In this case, gunzip will extract all members at once. For example:

gzip -c file1  > foo.gz
gzip -c file2 >> foo.gz

Then

gunzip -c foo

is equivalent to

cat file1 file2

Needless to say, file1 can be replaced by file1.gz.

You must notice this:

gunzip will extract all members at once

So to get all members individually, you will have to use something additional or write, if you wish to do so.

However, this is also addressed in man page.

If you wish to create a single archive file with multiple members so that members can later be extracted independently, use an archiver such as tar or zip. GNU tar supports the -z option to invoke gzip transparently. gzip is designed as a complement to tar, not as a replacement.

Just use cat. It is very fast (0.2 seconds for 500 MB for me)

cat *gz > final
mv final final.gz

You can then read the output with zcat to make sure it's pretty:

zcat final.gz

I tried the other answer of 'gz -c' but I ended up with garbage when using already gzipped files as input (I guess it double compressed them).

PV:

Better yet, if you have it, 'pv' instead of cat:

pv *gz > final
mv final final.gz

This gives you a progress bar as it works, but does the same thing as cat.