Determine if two files store the same content

How would you write a java function boolean sameContent(Path file1,Path file2)which determines if the two given paths point to files which store the same content? Of course, first, I would check if the file sizes are the same. This is a necessary condition for storing the same content. But then I'd like to listen to your approaches. If the two files are stored on the same hard drive (like in most of my cases) it's probably not the best way to jump too many times between the two streams.

123271 次浏览

This should help you with your problem:

package test;


import java.io.File;
import java.io.IOException;


import org.apache.commons.io.FileUtils;


public class CompareFileContents {


public static void main(String[] args) throws IOException {


File file1 = new File("test1.txt");
File file2 = new File("test2.txt");
File file3 = new File("test3.txt");


boolean compare1and2 = FileUtils.contentEquals(file1, file2);
boolean compare2and3 = FileUtils.contentEquals(file2, file3);
boolean compare1and3 = FileUtils.contentEquals(file1, file3);


System.out.println("Are test1.txt and test2.txt the same? " + compare1and2);
System.out.println("Are test2.txt and test3.txt the same? " + compare2and3);
System.out.println("Are test1.txt and test3.txt the same? " + compare1and3);
}
}

Exactly what FileUtils.contentEquals method of Apache commons IO does and api is here.

Try something like:

File file1 = new File("file1.txt");
File file2 = new File("file2.txt");
boolean isTwoEqual = FileUtils.contentEquals(file1, file2);

It does the following checks before actually doing the comparison:

  • existence of both the files
  • Both file's that are passed are to be of file type and not directory.
  • length in bytes should not be the same.
  • Both are different files and not one and the same.
  • Then compare the contents.

If you don't want to use any external libraries, then simply read the files into byte arrays and compare them (won't work pre Java-7):

byte[] f1 = Files.readAllBytes(file1);
byte[] f2 = Files.readAllBytes(file2);

by using Arrays.equals.

If the files are large, then instead of reading the entire files into arrays, you should use BufferedInputStream and read the files chunk-by-chunk as explained here.

If the files are small, you can read both into the memory and compare the byte arrays.

If the files are not small, you can either compute the hashes of their content (e.g. MD5 or SHA-1) one after the other and compare the hashes (but this still leaves a very small chance of error), or you can compare their content but for this you still have to read the streams alternating.

Here is an example:

boolean sameContent(Path file1, Path file2) throws IOException {
final long size = Files.size(file1);
if (size != Files.size(file2))
return false;


if (size < 4096)
return Arrays.equals(Files.readAllBytes(file1), Files.readAllBytes(file2));


try (InputStream is1 = Files.newInputStream(file1);
InputStream is2 = Files.newInputStream(file2)) {
// Compare byte-by-byte.
// Note that this can be sped up drastically by reading large chunks
// (e.g. 16 KBs) but care must be taken as InputStream.read(byte[])
// does not neccessarily read a whole array!
int data;
while ((data = is1.read()) != -1)
if (data != is2.read())
return false;
}


return true;
}

Since Java 12 there is method Files.mismatch which returns -1 if there is no mismatch in the content of the files. Thus the function would look like following:

private static boolean sameContent(Path file1, Path file2) throws IOException {
return Files.mismatch(file1, file2) == -1;
}
package test;


import org.junit.jupiter.api.Test;


import java.io.IOException;
import java.nio.file.FileSystems;
import java.nio.file.Files;
import java.nio.file.Path;


import static org.junit.Assert.assertEquals;


public class CSVResultDIfference {


@Test
public void csvDifference() throws IOException {
Path file_F = FileSystems.getDefault().getPath("C:\\Projekts\\csvTestX", "yolo2.csv");
long size_F = Files.size(file_F);
Path file_I = FileSystems.getDefault().getPath("C:\\Projekts\\csvTestZ", "yolo2.csv");
long size_I = Files.size(file_I);
assertEquals(size_F, size_I);


}
}

it worked for me :)

If it for unit test, then AssertJ provides a method named hasSameContentAs. An example:

Assertions.assertThat(file1).hasSameContentAs(file2)

I know I'm pretty late to the party on this one, but memory mapped IO is a pretty simple way to do this if you want to use straight Java APIs and no third party dependencies. It's only a few calls to open the files, map them, and then compare use ByteBuffer.equals(Object) to compare the files.

This is probably going to give you the best performance if you expect the particular file to be large because you're offloading a majority of the IO legwork onto the OS and the otherwise highly optimized bits of the JVM (assuming you're using a decent JVM).

Straight from the FileChannel JavaDoc:

For most operating systems, mapping a file into memory is more expensive than reading or writing a few tens of kilobytes of data via the usual read and write methods. From the standpoint of performance it is generally only worth mapping relatively large files into memory.

import java.io.IOException;
import java.nio.MappedByteBuffer;
import java.nio.channels.FileChannel;
import java.nio.file.Path;
import java.nio.file.StandardOpenOption;




public class MemoryMappedCompare {


public static boolean areFilesIdenticalMemoryMapped(final Path a, final Path b) throws IOException {
try (final FileChannel fca = FileChannel.open(a, StandardOpenOption.READ);
final FileChannel fcb = FileChannel.open(b, StandardOpenOption.READ)) {
final MappedByteBuffer mbba = fca.map(FileChannel.MapMode.READ_ONLY, 0, fca.size());
final MappedByteBuffer mbbb = fcb.map(FileChannel.MapMode.READ_ONLY, 0, fcb.size());
return mbba.equals(mbbb);
}
}
}


It's >=JR6 compatible, library-free and don't read all content at time.

public static boolean sameFile(File a, File b) {
if (a == null || b == null) {
return false;
}


if (a.getAbsolutePath().equals(b.getAbsolutePath())) {
return true;
}


if (!a.exists() || !b.exists()) {
return false;
}
if (a.length() != b.length()) {
return false;
}
boolean eq = true;
    

FileChannel channelA;
FileChannel channelB;
try {
channelA = new RandomAccessFile(a, "r").getChannel();
channelB = new RandomAccessFile(b, "r").getChannel();


long channelsSize = channelA.size();
ByteBuffer buff1 = channelA.map(FileChannel.MapMode.READ_ONLY, 0, channelsSize);
ByteBuffer buff2 = channelB.map(FileChannel.MapMode.READ_ONLY, 0, channelsSize);
for (int i = 0; i < channelsSize; i++) {
if (buff1.get(i) != buff2.get(i)) {
eq = false;
break;
}
}
} catch (FileNotFoundException ex) {
Logger.getLogger(HotUtils.class.getName()).log(Level.SEVERE, null, ex);
} catch (IOException ex) {
Logger.getLogger(HotUtils.class.getName()).log(Level.SEVERE, null, ex);
}
return eq;
}