End of File (EOF) in C

I am currently reading the book C Programming Language by Ritchie & Kernighan. And I am pretty confused about the usage of EOF in the getchar() function.

First, I want to know why the value of EOF is -1 and why the value of getchar()!=EOF is 0. Pardon me for my question but I really don't understand. I really tried but I can't.

Then I tried to run the example on the book that can count the number of characters using the code below but it seems that I never get out of the loop even if I press enter so I am wondering when would I reach the EOF?

main(){
long nc;
nc = 0;
while (getchar() != EOF)
++nc;
printf("%ld\n", nc);
}

Then, I read the same problem at Problem with EOF in C. Most people advised that instead of using EOF, use the terminator \n or the null terminator '\0' which makes a lot of sense.

Does it mean that the example on the book serves another purpose?

430474 次浏览

That's a lot of questions.

  1. Why EOF is -1: usually -1 in POSIX system calls is returned on error, so i guess the idea is "EOF is kind of error"

  2. any boolean operation (including !=) returns 1 in case it's TRUE, and 0 in case it's FALSE, so getchar() != EOF is 0 when it's FALSE, meaning getchar() returned EOF.

  3. in order to emulate EOF when reading from stdin press Ctrl+D

EOF indicates "end of file". A newline (which is what happens when you press enter) isn't the end of a file, it's the end of a line, so a newline doesn't terminate this loop.

The code isn't wrong[*], it just doesn't do what you seem to expect. It reads to the end of the input, but you seem to want to read only to the end of a line.

The value of EOF is -1 because it has to be different from any return value from getchar that is an actual character. So getchar returns any character value as an unsigned char, converted to int, which will therefore be non-negative.

If you're typing at the terminal and you want to provoke an end-of-file, use CTRL-D (unix-style systems) or CTRL-Z (Windows). Then after all the input has been read, getchar() will return EOF, and hence getchar() != EOF will be false, and the loop will terminate.

[*] well, it has undefined behavior if the input is more than LONG_MAX characters due to integer overflow, but we can probably forgive that in a simple example.

EOF is -1 because that's how it's defined. The name is provided by the standard library headers that you #include. They make it equal to -1 because it has to be something that can't be mistaken for an actual byte read by getchar(). getchar() reports the values of actual bytes using positive number (0 up to 255 inclusive), so -1 works fine for this.

The != operator means "not equal". 0 stands for false, and anything else stands for true. So what happens is, we call the getchar() function, and compare the result to -1 (EOF). If the result was not equal to EOF, then the result is true, because things that are not equal are not equal. If the result was equal to EOF, then the result is false, because things that are equal are not (not equal).

The call to getchar() returns EOF when you reach the "end of file". As far as C is concerned, the 'standard input' (the data you are giving to your program by typing in the command window) is just like a file. Of course, you can always type more, so you need an explicit way to say "I'm done". On Windows systems, this is control-Z. On Unix systems, this is control-D.

The example in the book is not "wrong". It depends on what you actually want to do. Reading until EOF means that you read everything, until the user says "I'm done", and then you can't read any more. Reading until '\n' means that you read a line of input. Reading until '\0' is a bad idea if you expect the user to type the input, because it is either hard or impossible to produce this byte with a keyboard at the command prompt :)