What is the best way to find out where is the problem with Gitlab (only used application on Ubuntu Plesk Onyx server), that every time I lookup at /proc/user_beancounters
the numtcpsock value is on normal state (< 100) and sometimes some Gitlab processes seems to exceed the numtcpsock limit (3000) more than 2300 times, so the virtual server (OpenVZ) crashes?
I already have limited the redis & postgresql connections on /etc/gitlab/gitlab.rb
:
postgresql['shared_buffers'] = "30MB"
postgresql['max_connections'] = 100
redis['maxclients'] = "500"
redis['tcp_timeout'] = "20"
redis['tcp_keepalive'] = "10"
sudo gitlab-ctl reconfigure && sudo gitlab-ctl restart
But that seems to don't prevent the server crashes. I need a approach to fix this problem. Have you some ideas?
Edit:
The server is only used by about 3-5 people netstat -pnt | wc -l
return about 49 tcp connections. cat /proc/user_beancounters
numtcpsock
33 at the moment. All of them except my ssh connection listening on local ip.
Here some examples:
tcp 0 0 127.0.0.1:47280 127.0.0.1:9168 TIME_WAIT -
tcp 0 0 127.0.0.1:9229 127.0.0.1:34810 TIME_WAIT -
tcp 0 0 127.0.0.1:9100 127.0.0.1:45758 TIME_WAIT -
tcp 0 0 127.0.0.1:56264 127.0.0.1:8082 TIME_WAIT -
tcp 0 0 127.0.0.1:9090 127.0.0.1:43670 TIME_WAIT -
tcp 0 0 127.0.0.1:9121 127.0.0.1:41636 TIME_WAIT -
tcp 0 0 127.0.0.1:9236 127.0.0.1:42842 TIME_WAIT -
tcp 0 0 127.0.0.1:9090 127.0.0.1:43926 TIME_WAIT -
tcp 0 0 127.0.0.1:9090 127.0.0.1:44538 TIME_WAIT -
A firewall and fail2ban with many jails (ssh etc) are also active on server.
The numtcpsock value is the amount of TCP connections to your openvz virtual server. Exceeding that wouldn't crash your server, but it would prevent any new TCP sockets from being created and if you only have remote access to the virtual server you would effectively be locked out.
I am not sure how gitlab would be reaching your maximum numtcpsock limit of 3000, unless you have a couple hundred concurrent users. If that is the case, you would simply need to upgrade your numtcpsock maximum limit.
The more likely cause of your numtcpsock issues, if you have a public IP address, would be excessive connections to SSH, HTTP or some other popular TCP service hackers like to probe.
When you are having numtcpsock issues, you would want to check the output of netstat -pnt
to see what TCP connections are open on your server. That output will show who is connected and on which port.
To prevent excessive TCP connections in the first place, if the problem is indeed gitlab, make sure that it is not configured in a way that will eat all your available connections. If the issue turns out to be caused by external connections that you do not want, make sure you have some reasonable firewall rules in place or a tool like fail2ban to do it for you.
Edit: Explanation of netstat flags used in answer (taken from netstat man page in Ubuntu 16.04)
-p, --program: show the PID and program to which each socket belongs
-l, --listening: show only listening sockets
-n, --numeric: show numerical addresses instead of trying to determine symbolic host, port or user names
-t, --tcp