如何防止扫描造成缓冲区溢出在 C?

我用这个代码:

while ( scanf("%s", buf) == 1 ){

防止可能的缓冲区溢出的最佳方法是什么,以便可以传递随机长度的字符串?

我知道我可以通过调用以下例子来限制输入字符串:

while ( scanf("%20s", buf) == 1 ){

但我希望能够处理用户输入的任何内容。 或者这不能安全地使用 Scanf 来完成,而我应该使用 fgets 吗?

125150 次浏览

Directly using scanf(3) and its variants poses a number of problems. Typically, users and non-interactive use cases are defined in terms of lines of input. It's rare to see a case where, if enough objects are not found, more lines will solve the problem, yet that's the default mode for scanf. (If a user didn't know to enter a number on the first line, a second and third line will probably not help.)

At least if you fgets(3) you know how many input lines your program will need, and you won't have any buffer overflows...

Limiting the length of the input is definitely easier. You could accept an arbitrarily-long input by using a loop, reading in a bit at a time, re-allocating space for the string as necessary...

But that's a lot of work, so most C programmers just chop off the input at some arbitrary length. I suppose you know this already, but using fgets() isn't going to allow you to accept arbitrary amounts of text - you're still going to need to set a limit.

Most of the time a combination of fgets and sscanf does the job. The other thing would be to write your own parser, if the input is well formatted. Also note your second example needs a bit of modification to be used safely:

#define LENGTH          42
#define str(x)          # x
#define xstr(x)         str(x)


/* ... */
int nc = scanf("%"xstr(LENGTH)"[^\n]%*[^\n]", array);

The above discards the input stream upto but not including the newline (\n) character. You will need to add a getchar() to consume this.

If you are using gcc, you can use the GNU-extension a specifier to have scanf() allocate memory for you to hold the input:

int main()
{
char *str = NULL;


scanf ("%as", &str);
if (str) {
printf("\"%s\"\n", str);
free(str);
}
return 0;
}

Edit: As Jonathan pointed out, you should consult the scanf man pages as the specifier might be different (%m) and you might need to enable certain defines when compiling.

In their book The Practice of Programming (which is well worth reading), Kernighan and Pike discuss this problem, and they solve it by using snprintf() to create the string with the correct buffer size for passing to the scanf() family of functions. In effect:

int scanner(const char *data, char *buffer, size_t buflen)
{
char format[32];
if (buflen == 0)
return 0;
snprintf(format, sizeof(format), "%%%ds", (int)(buflen-1));
return sscanf(data, format, buffer);
}

Note, this still limits the input to the size provided as 'buffer'. If you need more space, then you have to do memory allocation, or use a non-standard library function that does the memory allocation for you.


Note that the POSIX 2008 (2013) version of the scanf() family of functions supports a format modifier m (an assignment-allocation character) for string inputs (%s, %c, %[). Instead of taking a char * argument, it takes a char ** argument, and it allocates the necessary space for the value it reads:

char *buffer = 0;
if (sscanf(data, "%ms", &buffer) == 1)
{
printf("String is: <<%s>>\n", buffer);
free(buffer);
}

If the sscanf() function fails to satisfy all the conversion specifications, then all the memory it allocated for %ms-like conversions is freed before the function returns.

It's not that much work to make a function that's allocating the needed memory for your string. That's a little c-function i wrote some time ago, i always use it to read in strings.

It will return the read string or if a memory error occurs NULL. But be aware that you have to free() your string and always check for its return value.

#define BUFFER 32


char *readString()
{
char *str = malloc(sizeof(char) * BUFFER), *err;
int pos;
for(pos = 0; str != NULL && (str[pos] = getchar()) != '\n'; pos++)
{
if(pos % BUFFER == BUFFER - 1)
{
if((err = realloc(str, sizeof(char) * (BUFFER + pos + 1))) == NULL)
free(str);
str = err;
}
}
if(str != NULL)
str[pos] = '\0';
return str;
}