套接字接受-“打开的文件太多”

我正在做一个学校的项目,在这个项目中我必须编写一个多线程服务器,现在我正在通过对它运行一些测试将它与 apache 进行比较。我使用 autobench 来帮助解决这个问题,但是在我运行一些测试之后,或者如果我给它太高的速率(大约600 +)来建立连接,我会得到一个“太多打开的文件”错误。

在处理完请求之后,我总是在套接字上执行 close()。我也尝试过使用 shutdown()函数,但似乎没有任何帮助。有什么办法吗?

219024 次浏览

it can take a bit of time before a closed socket is really freed up

lsof to list open files

cat /proc/sys/fs/file-max to see if there's a system limit

There are multiple places where Linux can have limits on the number of file descriptors you are allowed to open.

You can check the following:

cat /proc/sys/fs/file-max

That will give you the system wide limits of file descriptors.

On the shell level, this will tell you your personal limit:

ulimit -n

This can be changed in /etc/security/limits.conf - it's the nofile param.

However, if you're closing your sockets correctly, you shouldn't receive this unless you're opening a lot of simulataneous connections. It sounds like something is preventing your sockets from being closed appropriately. I would verify that they are being handled properly.

When your program has more open descriptors than the open files ulimit (ulimit -a will list this), the kernel will refuse to open any more file descriptors. Make sure you don't have any file descriptor leaks - for example, by running it for a while, then stopping and seeing if any extra fds are still open when it's idle - and if it's still a problem, change the nofile ulimit for your user in /etc/security/limits.conf

TCP has a feature called "TIME_WAIT" that ensures connections are closed cleanly. It requires one end of the connection to stay listening for a while after the socket has been closed.

In a high-performance server, it's important that it's the clients who go into TIME_WAIT, not the server. Clients can afford to have a port open, whereas a busy server can rapidly run out of ports or have too many open FDs.

To achieve this, the server should never close the connection first -- it should always wait for the client to close it.

Use lsof -u `whoami` | wc -l to find how many open files the user has

I had similar problem. Quick solution is :

ulimit -n 4096

explanation is as follows - each server connection is a file descriptor. In CentOS, Redhat and Fedora, probably others, file user limit is 1024 - no idea why. It can be easily seen when you type: ulimit -n

Note this has no much relation to system max files (/proc/sys/fs/file-max).

In my case it was problem with Redis, so I did:

ulimit -n 4096
redis-server -c xxxx

in your case instead of redis, you need to start your server.

I had the same problem and I wasn't bothering to check the return values of the close() calls. When I started checking the return value, the problem mysteriously vanished.

I can only assume an optimisation glitch of the compiler (gcc in my case), is assuming that close() calls are without side effects and can be omitted if their return values aren't used.

I had this problem too. You have a file handle leak. You can debug this by printing out a list of all the open file handles (on POSIX systems):

void showFDInfo()
{
s32 numHandles = getdtablesize();


for ( s32 i = 0; i < numHandles; i++ )
{
s32 fd_flags = fcntl( i, F_GETFD );
if ( fd_flags == -1 ) continue;




showFDInfo( i );
}
}


void showFDInfo( s32 fd )
{
char buf[256];


s32 fd_flags = fcntl( fd, F_GETFD );
if ( fd_flags == -1 ) return;


s32 fl_flags = fcntl( fd, F_GETFL );
if ( fl_flags == -1 ) return;


char path[256];
sprintf( path, "/proc/self/fd/%d", fd );


memset( &buf[0], 0, 256 );
ssize_t s = readlink( path, &buf[0], 256 );
if ( s == -1 )
{
cerr << " (" << path << "): " << "not available";
return;
}
cerr << fd << " (" << buf << "): ";


if ( fd_flags & FD_CLOEXEC )  cerr << "cloexec ";


// file status
if ( fl_flags & O_APPEND   )  cerr << "append ";
if ( fl_flags & O_NONBLOCK )  cerr << "nonblock ";


// acc mode
if ( fl_flags & O_RDONLY   )  cerr << "read-only ";
if ( fl_flags & O_RDWR     )  cerr << "read-write ";
if ( fl_flags & O_WRONLY   )  cerr << "write-only ";


if ( fl_flags & O_DSYNC    )  cerr << "dsync ";
if ( fl_flags & O_RSYNC    )  cerr << "rsync ";
if ( fl_flags & O_SYNC     )  cerr << "sync ";


struct flock fl;
fl.l_type = F_WRLCK;
fl.l_whence = 0;
fl.l_start = 0;
fl.l_len = 0;
fcntl( fd, F_GETLK, &fl );
if ( fl.l_type != F_UNLCK )
{
if ( fl.l_type == F_WRLCK )
cerr << "write-locked";
else
cerr << "read-locked";
cerr << "(pid:" << fl.l_pid << ") ";
}
}

By dumping out all the open files you will quickly figure out where your file handle leak is.

If your server spawns subprocesses. E.g. if this is a 'fork' style server, or if you are spawning other processes ( e.g. via cgi ), you have to make sure to create your file handles with "cloexec" - both for real files and also sockets.

Without cloexec, every time you fork or spawn, all open file handles are cloned in the child process.

It is also really easy to fail to close network sockets - e.g. just abandoning them when the remote party disconnects. This will leak handles like crazy.

This means that the maximum number of simultaneously open files.

Solved:

At the end of the file /etc/security/limits.conf you need to add the following lines:

* soft nofile 16384
* hard nofile 16384

In the current console from root (sudo does not work) to do:

ulimit -n 16384

Although this is optional, if it is possible to restart the server.

In /etc/nginx/nginx.conf file to register the new value worker_connections equal to 16384 divide by value worker_processes.

If not did ulimit -n 16384, need to reboot, then the problem will recede.

PS:

If after the repair is visible in the logs error accept() failed (24: Too many open files):

In the nginx configuration, propevia (for example):

worker_processes 2;


worker_rlimit_nofile 16384;


events {
worker_connections 8192;
}

Just another information about CentOS. In this case, when using "systemctl" to launch process. You have to modify the system file ==> /usr/lib/systemd/system/processName.service .Had this line in the file :

LimitNOFILE=50000

And just reload your system conf :

systemctl daemon-reload

On MacOS, show the limits:

launchctl limit maxfiles

Result like: maxfiles 256 1000

If the numbers (soft limit & hard limit) are too low, you have to set upper:

sudo launchctl limit maxfiles 65536 200000

For future reference, I ran into a similar problem; I was creating too many file descriptors (FDs) by creating too many files and sockets (on Unix OSs, everything is a FD). My solution was to increase FDs at runtime with setrlimit().

First I got the FD limits, with the following code:

// This goes somewhere in your code
struct rlimit rlim;


if (getrlimit(RLIMIT_NOFILE, &rlim) == 0) {
std::cout << "Soft limit: " << rlim.rlim_cur << std::endl;
std::cout << "Hard limit: " << rlim.rlim_max << std::endl;
} else {
std::cout << "Unable to get file descriptor limits" << std::endl;
}

After running getrlimit(), I could confirm that on my system, the soft limit is 256 FDs, and the hard limit is infinite FDs (this is different depending on your distro and specs). Since I was creating > 300 FDs between files and sockets, my code was crashing.

In my case I couldn't decrease the number of FDs, so I decided to increase the FD soft limit instead, with this code:

// This goes somewhere in your code
struct rlimit rlim;


rlim.rlim_cur = NEW_SOFT_LIMIT;
rlim.rlim_max = NEW_HARD_LIMIT;


if (setrlimit(RLIMIT_NOFILE, &rlim) == -1) {
std::cout << "Unable to set file descriptor limits" << std::endl;
}

Note that you can also get the number of FDs that you are using, and the source of these FDs, with this code.

Also you can find more information on gettrlimit() and setrlimit() here and here.

Similar issue on Ubuntu 18 on vsphere. The cause - Config file nginx.conf contains too many log files and sockets. Sockets are treated as files in Linux. When nginx -s reload or sudo service nginx start/restart, the Too many open files error appeared in error.log.

NGINX worker processes were launched by NGINX user. Ulimit (soft and hard) for nginx user was 65536. The ulimit and setting limits.conf did not work.

The rlimit setting in nginx.conf did not help either: worker_rlimit_nofile 65536;

The solution that worked was:

$ mkdir -p /etc/systemd/system/nginx.service.d
$ nano /etc/systemd/system/nginx.service.d/nginx.conf
[Service]
LimitNOFILE=30000
$ systemctl daemon-reload
$ systemctl restart nginx.service