Edit: You can even pipe the data directly through e.g. Gunzip (using request):
var request = require('request'),
zlib = require('zlib'),
fs = require('fs'),
out = fs.createWriteStream('out');
// Fetch http://example.com/foo.gz, gunzip it and store the results in 'out'
request('http://example.com/foo.gz').pipe(zlib.createGunzip()).pipe(out);
For tar archives, there is Isaacs' tar module, which is used by npm.
Edit 2: Updated answer as zlib doesn't support the zip format. This will only work for gzip.
yauzl is a robust library for unzipping. Design principles:
Follow the spec. Don't scan for local file headers. Read the central directory for file metadata.
Don't block the JavaScript thread. Use and provide async APIs.
Keep memory usage under control. Don't attempt to buffer entire files in RAM at once.
Never crash (if used properly). Don't let malformed zip files bring down client applications who are trying to catch errors.
Catch unsafe filenames entries. A zip file entry throws an error if its file name starts with "/" or /[A-Za-z]:// or if it contains ".." path segments or "\" (per the spec).
I found success with the following, works with .zip
(Simplified here for posting: no error checking & just unzips all files to current folder)
function DownloadAndUnzip(URL){
var unzip = require('unzip');
var http = require('http');
var request = http.get(URL, function(response) {
response.pipe(unzip.Extract({path:'./'}))
});
}
I tried a few of the nodejs unzip libraries including adm-zip and unzip, then settled on extract-zip which is a wrapper around yauzl. Seemed the simplest to implement.
For an ancient and pervasive technology such as unzip I would expect there to exist a fairly popular, mature node.js unzip library that is "stagnant" and "unmaintained" because it is "complete".
However, most libraries appear either to be completely terrible or to have commits recently as just a few months ago. This is quite concerning... so I've gone through several unzip libraries, read their docs, and tried their examples to try to figure out WTF. For example, I've tried these:
'use strict';
var fs = require('fs');
var StreamZip = require('node-stream-zip');
var zip = new StreamZip({
file: './example.zip'
, storeEntries: true
});
zip.on('error', function (err) { console.error('[ERROR]', err); });
zip.on('ready', function () {
console.log('All entries read: ' + zip.entriesCount);
//console.log(zip.entries());
});
zip.on('entry', function (entry) {
var pathname = path.resolve('./temp', entry.name);
if (/\.\./.test(path.relative('./temp', pathname))) {
console.warn("[zip warn]: ignoring maliciously crafted paths in zip file:", entry.name);
return;
}
if ('/' === entry.name[entry.name.length - 1]) {
console.log('[DIR]', entry.name);
return;
}
console.log('[FILE]', entry.name);
zip.stream(entry.name, function (err, stream) {
if (err) { console.error('Error:', err.toString()); return; }
stream.on('error', function (err) { console.log('[ERROR]', err); return; });
// example: print contents to screen
//stream.pipe(process.stdout);
// example: save contents to file
fs.mkdir(
path.dirname(pathname),
{ recursive: true },
function (err) {
stream.pipe(fs.createWriteStream(pathname));
}
);
});
});
Security Warning:
Not sure if this checks entry.name for maliciously crafted paths that would resolve incorrectly (such as ../../../foo or /etc/passwd).
You can easily check this yourself by comparing /\.\./.test(path.relative('./to/dir', path.resolve('./to/dir', entry.name))).
Pros: (Why do I think it's the best?)
can unzip normal files (maybe not some crazy ones with weird extensions)
can stream
seems to not have to load the whole zip to read entries
has examples in normal JavaScript (not compiled)
doesn't include the kitchen sink (i.e. url loading, S3, or db layers)
uses some existing code from a popular library
doesn't have too much senseless hipster or ninja-foo in the code
Cons:
Swallows errors like a hungry hippo
Throws strings instead of errors (no stack traces)
zip.extract() doesn't seem to work (hence I used zip.stream() in my example)
Runner up: node-unzipper
Install:
npm install --save unzipper
Usage:
'use strict';
var fs = require('fs');
var unzipper = require('unzipper');
fs.createReadStream('./example.zip')
.pipe(unzipper.Parse())
.on('entry', function (entry) {
var fileName = entry.path;
var type = entry.type; // 'Directory' or 'File'
console.log();
if (/\/$/.test(fileName)) {
console.log('[DIR]', fileName, type);
return;
}
console.log('[FILE]', fileName, type);
// TODO: probably also needs the security check
entry.pipe(process.stdout/*fs.createWriteStream('output/path')*/);
// NOTE: To ignore use entry.autodrain() instead of entry.pipe()
});
Pros:
Seems to work in a similar manner to node-stream-zip, but less control
A more functional fork of unzip
Seems to run in serial rather than in parallel
Cons:
Kitchen sink much? Just includes a ton of stuff that's not related to unzipping
Reads the whole file (by chunk, which is fine), not just random seeks
If you don't need to put multiple files into archive, but rather compress one file or just a string contents, then zlib.deflateRaw/zlib.inflateRaw can be used.
Here is an example how to compress in-memory string on macOS/iOS and decompress it in NodeJS.
// Swift, macOS/iOS
let data = "HelloZip!".data(using: .utf8)!
let compressedData = (data as NSData).compressed(using: .zlib) as Data
let compressedDataAsBase64EncodedString = compressedData.base64EncodedString()
print(compressedDataAsBase64EncodedString)
// Prints: 80jNycmPyixQBAA=