从 Zip 文件中的文件中读取内容

我试图创建一个简单的 java 程序,它读取和提取压缩文件中的文件内容。Zip 文件包含3个文件(txt,pdf,docx)。我需要读取所有这些文件的内容,我正在使用 阿帕奇提卡为此目的。

有没有人能帮我实现这个功能。我已经试过了,但是没有成功

代码片段

public class SampleZipExtract {




public static void main(String[] args) {


List<String> tempString = new ArrayList<String>();
StringBuffer sbf = new StringBuffer();


File file = new File("C:\\Users\\xxx\\Desktop\\abc.zip");
InputStream input;
try {


input = new FileInputStream(file);
ZipInputStream zip = new ZipInputStream(input);
ZipEntry entry = zip.getNextEntry();


BodyContentHandler textHandler = new BodyContentHandler();
Metadata metadata = new Metadata();


Parser parser = new AutoDetectParser();


while (entry!= null){


if(entry.getName().endsWith(".txt") ||
entry.getName().endsWith(".pdf")||
entry.getName().endsWith(".docx")){
System.out.println("entry=" + entry.getName() + " " + entry.getSize());
parser.parse(input, textHandler, metadata, new ParseContext());
tempString.add(textHandler.toString());
}
}
zip.close();
input.close();


for (String text : tempString) {
System.out.println("Apache Tika - Converted input string : " + text);
sbf.append(text);
System.out.println("Final text from all the three files " + sbf.toString());
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (SAXException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (TikaException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
208188 次浏览

Because of the condition in while, the loop might never break:

while (entry != null) {
// If entry never becomes null here, loop will never break.
}

Instead of the null check there, you can try this:

ZipEntry entry = null;
while ((entry = zip.getNextEntry()) != null) {
// Rest of your code
}

If you're wondering how to get the file content from each ZipEntry it's actually quite simple. Here's a sample code:

public static void main(String[] args) throws IOException {
ZipFile zipFile = new ZipFile("C:/test.zip");


Enumeration<? extends ZipEntry> entries = zipFile.entries();


while(entries.hasMoreElements()){
ZipEntry entry = entries.nextElement();
InputStream stream = zipFile.getInputStream(entry);
}
}

Once you have the InputStream you can read it however you want.

Sample code you can use to let Tika take care of container files for you. http://wiki.apache.org/tika/RecursiveMetadata

Form what I can tell, the accepted solution will not work for cases where there are nested zip files. Tika, however will take care of such situations as well.

My way of achieving this is by creating ZipInputStream wrapping class that would handle that would provide only the stream of current entry:

The wrapper class:

public class ZippedFileInputStream extends InputStream {


private ZipInputStream is;


public ZippedFileInputStream(ZipInputStream is){
this.is = is;
}


@Override
public int read() throws IOException {
return is.read();
}


@Override
public void close() throws IOException {
is.closeEntry();
}

}

The use of it:

    ZipInputStream zipInputStream = new ZipInputStream(new FileInputStream("SomeFile.zip"));


while((entry = zipInputStream.getNextEntry())!= null) {


ZippedFileInputStream archivedFileInputStream = new ZippedFileInputStream(zipInputStream);


//... perform whatever logic you want here with ZippedFileInputStream


// note that this will only close the current entry stream and not the ZipInputStream
archivedFileInputStream.close();


}
zipInputStream.close();

One advantage of this approach: InputStreams are passed as an arguments to methods that process them and those methods have a tendency to immediately close the input stream after they are done with it.

As of Java 7, the NIO АРI provides a better and more generic way of accessing the contents of ZIP or JAR files. Actually, it is now a unified API which allows you to treat ZIP files exactly like normal files.

In order to extract all of the files contained inside of a ZIP file in this API, you'd do as shown below.

In Java 8

private void extractAll(URI fromZip, Path toDirectory) throws IOException {
FileSystems.newFileSystem(fromZip, Collections.emptyMap())
.getRootDirectories()
.forEach(root -> {
// in a full implementation, you'd have to
// handle directories
Files.walk(root).forEach(path -> Files.copy(path, toDirectory));
});
}

In Java 7

private void extractAll(URI fromZip, Path toDirectory) throws IOException {
FileSystem zipFs = FileSystems.newFileSystem(fromZip, Collections.emptyMap());


for (Path root : zipFs.getRootDirectories()) {
Files.walkFileTree(root, new SimpleFileVisitor<Path>() {
@Override
public FileVisitResult visitFile(Path file, BasicFileAttributes attrs)
throws IOException {
// You can do anything you want with the path here
Files.copy(file, toDirectory);
return FileVisitResult.CONTINUE;
}


@Override
public FileVisitResult preVisitDirectory(Path dir, BasicFileAttributes attrs)
throws IOException {
// In a full implementation, you'd need to create each
// sub-directory of the destination directory before
// copying files into it
return super.preVisitDirectory(dir, attrs);
}
});
}
}

i did mine like this and remember to change url or zip files jdk 15

import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.net.MalformedURLException;
import java.net.URL;
import java.util.Scanner;
import java.util.stream.Stream;
import java.util.zip.ZipEntry;
import java.util.zip.ZipFile;
import java.io.*;
import java.util.*;
import java.nio.file.Paths;


class Main {
public static void main(String[] args) throws MalformedURLException,FileNotFoundException,IOException{
String url,kfile;
Scanner getkw = new Scanner(System.in);
System.out.println(" Please Paste Url ::");
url = getkw.nextLine();
System.out.println("Please enter name of file you want to save as :: ");
kfile = getkw.nextLine();
getkw.close();
Main Dinit = new Main();
System.out.println(Dinit.dloader(url, kfile));
ZipFile Vanilla = new ZipFile(new File("Vanilla.zip"));
Enumeration<? extends ZipEntry> entries = Vanilla.entries();


while(entries.hasMoreElements()){
ZipEntry entry = entries.nextElement();
//        String nextr =  entries.nextElement();
InputStream stream = Vanilla.getInputStream(entry);
FileInputStream inpure= new FileInputStream("Vanilla.zip");
FileOutputStream outter = new FileOutputStream(new File(entry.toString()));
outter.write(inpure.readAllBytes());
outter.close();
}


}
private String dloader(String kurl, String fname)throws IOException{
String status ="";
try {
URL url = new URL("URL here");
FileOutputStream out = new FileOutputStream(new File("Vanilla.zip"));         // Output File
out.write(url.openStream().readAllBytes());
out.close();
} catch (MalformedURLException e) {
status = "Status: MalformedURLException Occured";
}catch (IOException e) {
status = "Status: IOexception Occured";
}finally{
status = "Status: Good";}
String path="\\tkwgter5834\\";
extractor(fname,"tkwgter5834",path);
    



return status;
}
private String extractor(String fname,String dir,String path){
File folder = new File(dir);
if(!folder.exists()){
folder.mkdir();
}
return "";
}
}