We deliver IPTV-services to our Customers with Varnish-cache 6.0 and I have been a bit worried that there might be a problem with our Varnish-cache servers. This assumption is based on the amount of customer incident reports when the IPTV-stream flows through our Varnish-cache instead of the backend-server directly.
That is why I would like to eliminate all errors from varnishstat to narrow down the possible reasons for the incidents, since at the moment I don't have a better angle to troubleshoot the problem.
Let's state that I am far from being familiar or an expert with Varnish.
So let's dig in to the "problem":
varnishstat -1 output:
MAIN.sess_closed 38788 0.01 Session Closed
MAIN.sess_closed_err 15260404 3.47 Session Closed with error
Basically almost all of the connections to Varnish-cache servers close with error. I set up a Virtualized Demo-server to our Network with identical Varnish configuration and there the only sess_closed_err were generated when I changed channels in my VLC-mediaplayer. Let's note that I was not able to run but a few VLC's at the same time to the server and that our customers use STB-boxes to use the service.
So my actual question is, how can I troubleshoot what causes the sessions to close with error?
There are some other counters that will show more specifically what happens with the sessions. The next step in your troubleshooting is therefore to look at these counters:
varnishstat -1 | grep ^MAIN.sc_
I'll elaborate a bit with a typical example:
$ sudo varnishstat -1 | egrep "(sess_closed|sc_)"
MAIN.sess_closed 8918046 1.45 Session Closed
MAIN.sess_closed_err 96244948 15.69 Session Closed with error
MAIN.sc_rem_close 86307498 14.07 Session OK REM_CLOSE
MAIN.sc_req_close 8402217 1.37 Session OK REQ_CLOSE
MAIN.sc_req_http10 45930 0.01 Session Err REQ_HTTP10
MAIN.sc_rx_bad 0 0.00 Session Err RX_BAD
MAIN.sc_rx_body 0 0.00 Session Err RX_BODY
MAIN.sc_rx_junk 132 0.00 Session Err RX_JUNK
MAIN.sc_rx_overflow 2 0.00 Session Err RX_OVERFLOW
MAIN.sc_rx_timeout 96193210 15.68 Session Err RX_TIMEOUT
MAIN.sc_tx_pipe 0 0.00 Session OK TX_PIPE
MAIN.sc_tx_error 0 0.00 Session Err TX_ERROR
MAIN.sc_tx_eof 3 0.00 Session OK TX_EOF
MAIN.sc_resp_close 0 0.00 Session OK RESP_CLOSE
MAIN.sc_overload 0 0.00 Session Err OVERLOAD
MAIN.sc_pipe_overflow 0 0.00 Session Err PIPE_OVERFLOW
MAIN.sc_range_short 0 0.00 Session Err RANGE_SHORT
MAIN.sc_req_http20 0 0.00 Session Err REQ_HTTP20
MAIN.sc_vcl_failure 0 0.00 Session Err VCL_FAILURE
The output from this specific environment shows that the majority of the sessions that close with error happens due to receive timeout (MAIN.sc_rx_timeout
). This timeout controls how long varnish will keep idle connections open, and is set using the timeout_idle
parameter to varnishd
. Its value is 5 seconds by default. Use varnishadm
to see the current value and the description of the timeout:
$ sudo varnishadm param.show timeout_idle
timeout_idle
Value is: 10.000 [seconds]
Default is: 5.000
Minimum is: 0.000
Idle timeout for client connections.
A connection is considered idle until we have received the full
request headers.
This parameter is particularly relevant for HTTP1 keepalive
connections which are closed unless the next request is
received before this timeout is reached.
Increasing timeout_idle
will likely reduce the number of sessions that are closed due to idle timeout. This can be done by setting the value as a parameter when starting varnish. Example:
varnishd [...] -p timeout_idle=15
Note that there are pros and cons related to increasing this timeout.