Read the file through and count the number of newline characters. An easy way to read a file in Java, one line at a time, is the java.util.Scanner class.
BufferedReader reader = new BufferedReader(new FileReader("file.txt"));
int lines = 0;
while (reader.readLine() != null) lines++;
reader.close();
Update: To answer the performance-question raised here, I made a measurement. First thing: 20.000 lines are too few, to get the program running for a noticeable time. I created a text-file with 5 million lines. This solution (started with java without parameters like -server or -XX-options) needed around 11 seconds on my box. The same with wc -l (UNIX command-line-tool to count lines), 11 seconds. The solution reading every single character and looking for '\n' needed 104 seconds, 9-10 times as much.
Reader r = new FileReader("f.txt");
int count = 0;
int nextchar = 0;
while (nextchar != -1){
nextchar = r.read();
if (nextchar == Character.getNumericValue('\n') ){
count++;
}
}
My search for a simple example has createde one thats actually quite poor. calling read() repeadedly for a single character is less than optimal. see here for examples and measurements.
All previous answers suggest to read though the whole file and count the amount of newlines you find while doing this. You commented some as "not effective" but thats the only way you can do that. A "line" is nothing else as a simple character inside the file. And to count that character you must have a look at every single character within the file.
If the already posted answers aren't fast enough you'll probably have to look for a solution specific to your particular problem.
For example if these text files are logs that are only appended to and you regularly need to know the number of lines in them you could create an index. This index would contain the number of lines in the file, when the file was last modified and how large the file was then. This would allow you to recalculate the number of lines in the file by skipping over all the lines you had already seen and just reading the new lines.
Try the unix "wc" command. I don't mean use it, I mean download the source and see how they do it. It's probably in c, but you can easily port the behavior to java. The problem with making your own is to account for the ending cr/lf problem.
This is about as efficient as it can get, buffered binary read, no string conversion,
FileInputStream stream = new FileInputStream("/tmp/test.txt");
byte[] buffer = new byte[8192];
int count = 0;
int n;
while ((n = stream.read(buffer)) > 0) {
for (int i = 0; i < n; i++) {
if (buffer[i] == '\n') count++;
}
}
stream.close();
System.out.println("Number of lines: " + count);
import java.io.*;
public class Counter {
public final static void main(String[] args) throws IOException {
if (args.length > 0) {
File file = new File(args[0]);
System.out.println(countLines(file));
}
}
public final static int countLines(File file) throws IOException {
ProcessBuilder builder = new ProcessBuilder("wc", "-l", file.getAbsolutePath());
Process process = builder.start();
InputStream in = process.getInputStream();
LineNumberReader reader = new LineNumberReader(new InputStreamReader(in));
String line = reader.readLine();
if (line != null) {
return Integer.parseInt(line.trim().split(" ")[0]);
} else {
return -1;
}
}
}
Do You need exact number of lines or only its approximation? I happen to process large files in parallel and often I don't need to know exact count of lines - I then revert to sampling. Split the file into ten 1MB chunks and count lines in each chunk, then multiply it by 10 and You'll receive pretty good approximation of line count.
This solution is about 3.6× faster than the top rated answer when tested on a file with 13.8 million lines. It simply reads the bytes into a buffer and counts the \n characters. You could play with the buffer size, but on my machine, anything above 8KB didn't make the code faster.
private int countLines(File file) throws IOException {
int lines = 0;
FileInputStream fis = new FileInputStream(file);
byte[] buffer = new byte[BUFFER_SIZE]; // BUFFER_SIZE = 8 * 1024
int read;
while ((read = fis.read(buffer)) != -1) {
for (int i = 0; i < read; i++) {
if (buffer[i] == '\n') lines++;
}
}
fis.close();
return lines;
}
Old post, but I have a solution that could be usefull for next people.
Why not just use file length to know what is the progression? Of course, lines has to be almost the same size, but it works very well for big files:
public static void main(String[] args) throws IOException {
File file = new File("yourfilehere");
double fileSize = file.length();
System.out.println("=======> File size = " + fileSize);
InputStream inputStream = new FileInputStream(file);
InputStreamReader inputStreamReader = new InputStreamReader(inputStream, "iso-8859-1");
BufferedReader bufferedReader = new BufferedReader(inputStreamReader);
int totalRead = 0;
try {
while (bufferedReader.ready()) {
String line = bufferedReader.readLine();
// LINE PROCESSING HERE
totalRead += line.length() + 1; // we add +1 byte for the newline char.
System.out.println("Progress ===> " + ((totalRead / fileSize) * 100) + " %");
}
} finally {
bufferedReader.close();
}
}
It allows to see the progression without doing any full read on the file. I know it depends on lot of elements, but I hope it will be usefull :).
[Edition]
Here is a version with estimated time. I put some SYSO to show progress and estimation. I see that you have a good time estimation errors after you have treated enough line (I try with 10M lines, and after 1% of the treatment, the time estimation was exact at 95%).
I know, some values has to be set in variable. This code is quickly written but has be usefull for me. Hope it will be for you too :).
long startProcessLine = System.currentTimeMillis();
int totalRead = 0;
long progressTime = 0;
double percent = 0;
int i = 0;
int j = 0;
int fullEstimation = 0;
try {
while (bufferedReader.ready()) {
String line = bufferedReader.readLine();
totalRead += line.length() + 1;
progressTime = System.currentTimeMillis() - startProcessLine;
percent = (double) totalRead / fileSize * 100;
if ((percent > 1) && i % 10000 == 0) {
int estimation = (int) ((progressTime / percent) * (100 - percent));
fullEstimation += progressTime + estimation;
j++;
System.out.print("Progress ===> " + percent + " %");
System.out.print(" - current progress : " + (progressTime) + " milliseconds");
System.out.print(" - Will be finished in ===> " + estimation + " milliseconds");
System.out.println(" - estimated full time => " + (progressTime + estimation));
}
i++;
}
} finally {
bufferedReader.close();
}
System.out.println("Ended in " + (progressTime) + " seconds");
System.out.println("Estimative average ===> " + (fullEstimation / j));
System.out.println("Difference: " + ((((double) 100 / (double) progressTime)) * (progressTime - (fullEstimation / j))) + "%");
Feel free to improve this code if you think it's a good solution.
If you don't specify the character encoding, the default one used is UTF-8. You may specify an alternate encoding to match your particular data file as shown in the example above.