httpnginxloggingwrk

How to find out the errors behind a lot of non-2xx or 3xx responses when load testing nginx reverse proxy with wrk


We are doing some test with NGINX as reverse proxy in front of two NGINX sample web servers. The tool being used in our tests is wrk. The web servers' configuration are very simple. Each of them has a static page (similar to default welcome page) and the NGINX proxy is directing traffic in a round robin fashion. The aim of the test is to measure the impact of different OSes with a NGiNX reverse proxy on the results (We are doing this with CentOS 7, Debian 10 and FreeBSD 12) In our results, (except FreeBSD) we have a lot of non-2xx or 3xx errors inside:

      10 threads and 400 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    74.50ms  221.36ms   1.90s    91.31%
    Req/Sec     5.88k     4.56k   16.01k    43.96%
  Latency Distribution
     50%    4.68ms
     75%    7.71ms
     90%  196.01ms
     99%    1.03s 
  3509526 requests in 1.00m, 1.11GB read
  Socket errors: connect 0, read 0, write 0, timeout 875
  Non-2xx or 3xx responses: 3285230
Requests/sec:  58431.20
Transfer/sec:     18.96MB

As you can see, about 90 percent of the responses are in this category. I've tried several different configurations on NGINX logging to "catch" some of these errors. But all I get is 200 OK in the log. How can I get more information about these responses?


Solution

  • After some research, I was able to track this down with tcpdump on the proxy node like below :

    After running wrk on the proxy, I ran tcpdump like below :

    tcpdump -i ens192 port 80 -nn
    

    And the result - though quite big - had some interesting insights :

    10:53:33.317363 IP x.x.x.x.80 > y.y.y.y.28375: Flags [P.], seq 389527:389857, ack 37920, win 509, options [nop,nop,TS val 825684835 ecr 679917942], length 330: HTTP: HTTP/1.1 502 Bad Gateway
    

    The reason I could not see the error in nginx logs is that in reverse proxy mode logging, ngnix will log the results only in debug mode, which, itself, makes the processing so slow that the above error could not surface. Using tcpdump, I could find out what can be the issue inside the packets.