ruby-on-rails tcp gitlab gitlab-omnibus openvz

Gitlab exceeds numtcpsock beancounter limit (OpenVZ)

What is the best way to find out where is the problem with Gitlab (only used application on Ubuntu Plesk Onyx server), that every time I lookup at /proc/user_beancounters the numtcpsock value is on normal state (< 100) and sometimes some Gitlab processes seems to exceed the numtcpsock limit (3000) more than 2300 times, so the virtual server (OpenVZ) crashes?

I already have limited the redis & postgresql connections on /etc/gitlab/gitlab.rb:

postgresql['shared_buffers'] = "30MB"
postgresql['max_connections'] = 100

redis['maxclients'] = "500"
redis['tcp_timeout'] = "20"
redis['tcp_keepalive'] = "10"

sudo gitlab-ctl reconfigure && sudo gitlab-ctl restart

But that seems to don't prevent the server crashes. I need a approach to fix this problem. Have you some ideas?

Edit:

The server is only used by about 3-5 people netstat -pnt | wc -l return about 49 tcp connections. cat /proc/user_beancounters numtcpsock 33 at the moment. All of them except my ssh connection listening on local ip.

Here some examples:

tcp        0      0 127.0.0.1:47280         127.0.0.1:9168          TIME_WAIT   -
tcp        0      0 127.0.0.1:9229          127.0.0.1:34810         TIME_WAIT   -
tcp        0      0 127.0.0.1:9100          127.0.0.1:45758         TIME_WAIT   -
tcp        0      0 127.0.0.1:56264         127.0.0.1:8082          TIME_WAIT   -
tcp        0      0 127.0.0.1:9090          127.0.0.1:43670         TIME_WAIT   -
tcp        0      0 127.0.0.1:9121          127.0.0.1:41636         TIME_WAIT   -
tcp        0      0 127.0.0.1:9236          127.0.0.1:42842         TIME_WAIT   -
tcp        0      0 127.0.0.1:9090          127.0.0.1:43926         TIME_WAIT   -
tcp        0      0 127.0.0.1:9090          127.0.0.1:44538         TIME_WAIT   -

A firewall and fail2ban with many jails (ssh etc) are also active on server.

Solution

The numtcpsock value is the amount of TCP connections to your openvz virtual server. Exceeding that wouldn't crash your server, but it would prevent any new TCP sockets from being created and if you only have remote access to the virtual server you would effectively be locked out.

I am not sure how gitlab would be reaching your maximum numtcpsock limit of 3000, unless you have a couple hundred concurrent users. If that is the case, you would simply need to upgrade your numtcpsock maximum limit.

The more likely cause of your numtcpsock issues, if you have a public IP address, would be excessive connections to SSH, HTTP or some other popular TCP service hackers like to probe.

When you are having numtcpsock issues, you would want to check the output of netstat -pnt to see what TCP connections are open on your server. That output will show who is connected and on which port.

To prevent excessive TCP connections in the first place, if the problem is indeed gitlab, make sure that it is not configured in a way that will eat all your available connections. If the issue turns out to be caused by external connections that you do not want, make sure you have some reasonable firewall rules in place or a tool like fail2ban to do it for you.

Edit: Explanation of netstat flags used in answer (taken from netstat man page in Ubuntu 16.04)

-p, --program: show the PID and program to which each socket belongs
-l, --listening: show only listening sockets
-n, --numeric: show numerical addresses instead of trying to determine symbolic host, port or user names
-t, --tcp