varnishiptv

Troubleshooting varnish sess_closed_err


We deliver IPTV-services to our Customers with Varnish-cache 6.0 and I have been a bit worried that there might be a problem with our Varnish-cache servers. This assumption is based on the amount of customer incident reports when the IPTV-stream flows through our Varnish-cache instead of the backend-server directly.

That is why I would like to eliminate all errors from varnishstat to narrow down the possible reasons for the incidents, since at the moment I don't have a better angle to troubleshoot the problem.

Let's state that I am far from being familiar or an expert with Varnish.

So let's dig in to the "problem":

varnishstat -1 output:

MAIN.sess_closed 38788 0.01 Session Closed

MAIN.sess_closed_err 15260404 3.47 Session Closed with error

Basically almost all of the connections to Varnish-cache servers close with error. I set up a Virtualized Demo-server to our Network with identical Varnish configuration and there the only sess_closed_err were generated when I changed channels in my VLC-mediaplayer. Let's note that I was not able to run but a few VLC's at the same time to the server and that our customers use STB-boxes to use the service.

So my actual question is, how can I troubleshoot what causes the sessions to close with error?


Solution

  • There are some other counters that will show more specifically what happens with the sessions. The next step in your troubleshooting is therefore to look at these counters:

    varnishstat -1 | grep ^MAIN.sc_
    

    I'll elaborate a bit with a typical example:

    $ sudo varnishstat -1 | egrep "(sess_closed|sc_)"
    MAIN.sess_closed                 8918046         1.45 Session Closed
    MAIN.sess_closed_err            96244948        15.69 Session Closed with error
    MAIN.sc_rem_close               86307498        14.07 Session OK  REM_CLOSE
    MAIN.sc_req_close                8402217         1.37 Session OK  REQ_CLOSE
    MAIN.sc_req_http10                 45930         0.01 Session Err REQ_HTTP10
    MAIN.sc_rx_bad                         0         0.00 Session Err RX_BAD
    MAIN.sc_rx_body                        0         0.00 Session Err RX_BODY
    MAIN.sc_rx_junk                      132         0.00 Session Err RX_JUNK
    MAIN.sc_rx_overflow                    2         0.00 Session Err RX_OVERFLOW
    MAIN.sc_rx_timeout              96193210        15.68 Session Err RX_TIMEOUT
    MAIN.sc_tx_pipe                        0         0.00 Session OK  TX_PIPE
    MAIN.sc_tx_error                       0         0.00 Session Err TX_ERROR
    MAIN.sc_tx_eof                         3         0.00 Session OK  TX_EOF
    MAIN.sc_resp_close                     0         0.00 Session OK  RESP_CLOSE
    MAIN.sc_overload                       0         0.00 Session Err OVERLOAD
    MAIN.sc_pipe_overflow                  0         0.00 Session Err PIPE_OVERFLOW
    MAIN.sc_range_short                    0         0.00 Session Err RANGE_SHORT
    MAIN.sc_req_http20                     0         0.00 Session Err REQ_HTTP20
    MAIN.sc_vcl_failure                    0         0.00 Session Err VCL_FAILURE
    

    The output from this specific environment shows that the majority of the sessions that close with error happens due to receive timeout (MAIN.sc_rx_timeout). This timeout controls how long varnish will keep idle connections open, and is set using the timeout_idle parameter to varnishd. Its value is 5 seconds by default. Use varnishadm to see the current value and the description of the timeout:

    $ sudo varnishadm param.show timeout_idle
    timeout_idle
            Value is: 10.000 [seconds]
            Default is: 5.000
            Minimum is: 0.000
    
            Idle timeout for client connections.
    
            A connection is considered idle until we have received the full
            request headers.
    
            This parameter is particularly relevant for HTTP1 keepalive
            connections which are closed unless the next request is
            received before this timeout is reached.
    

    Increasing timeout_idle will likely reduce the number of sessions that are closed due to idle timeout. This can be done by setting the value as a parameter when starting varnish. Example:

    varnishd [...] -p timeout_idle=15
    

    Note that there are pros and cons related to increasing this timeout.