nginxamazon-s3network-programmingtomcat9proxmox

Videos greater than 2 MB are not processed by the Nginx server to backend & to AWS S3 bucket


We have been developing an enterprise application for the last two years. Based on microservice architecture, we have nine services with their respective databases and an Angular frontend on NGINX that calls/connects microservices. During our development, we implemented these services and their databases on the Hetzner cloud server with 4GB RAM and 2 CPUs over the internal network, and everything has been working seamlessly. We are uploading all images, pdf, and videos on AWS S3, and it has been smooth sailing. Videos of all sizes were uploaded and played without any issues.

We liked Hetzner and decided to go production also with them. We took the first server and installed proxmox over it, and deployed LXC containers and our services. I tested again here, and no problems were found again.

We then decided to take another server, deployed proxmox, and clustered them. This is where the problem started when we hired a network guy who configured a bridged network between the containers of both nodes. Each container pings the other well, and the telnet also connects over an internal network. MTU set on this bridge is 1400.

Primary Problem- We are NOT able to upload videos over 2 MB to S3 anymore from this network

Other problems – These are intermittent issues, noted in logs–

  1. NGNIX – 504 Gateway Time-out ERRORS of likes, on multiple services--> upstream timed out (110: Connection timed out) while reading response header from upstream, client: 223.235.101.169, server: abc.xyz.com, request: "GET /courses/course HTTP/1.1", upstream: "http://10.10.XX.XX:8080//courses/course/toBeApprove", host: " abc.xyz.com, ", referrer: "https:// abc.xyz.com, /"

  2. Tomcat- com.amazonaws.services.s3.model.AmazonS3Exception: Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed. (Service: Amazon S3; Status Code: 400; Error Code: RequestTimeout; Request ID: 7J2EHKVDWQP3367G; S3 Extended Request ID: xGGCQhESxh/Mo6ddwtGYShLIeCJYbgCRT8oGleQu/IfguEfbZpTQXG/AIzgLnG2F5YuCqk7vVE8=), S3 Extended Request ID: xGGCQhESxh/Mo6ddwtGYShLIeCJYbgCRT8oGleQu/IfguEfbZpTQXG/AIzgLnG2F5YuCqk7vVE8=

(we increased all known timeouts, both in nginx and tomcat)

  1. Mysql- 2022-09-08T04:24:27.235964Z 8 [Warning] [MY-010055] [Server] IP address '10.10.XX.XX could not be resolved: Name or service not known

Other key points to note – we allow video up to 100 mb to upload thus known limits set in nginx and tomcat configurations

Nginx, client_max_body_size 100m;

And tomcat <Connector port="8080" protocol="HTTP/1.1" maxPostSize="102400” maxHttpHeaderSize="102400" connectionTimeout="20000" redirectPort="8443" />

In these readings and trials running over last 15 days, we stopped, all firewalls, ufw on OS, proxmox firewall, and even the data center firewall while debugging.

This is our nginx.conf

http {
    proxy_http_version 1.1;
    proxy_set_header Connection "";
    ##
    client_body_buffer_size 16K;
    client_header_buffer_size 1k;
    client_max_body_size 100m;
    client_header_timeout 100s;
    client_body_timeout 100s;
    large_client_header_buffers 4 16k;
    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    keepalive_timeout 300;
    send_timeout 600;
    proxy_connect_timeout 600;
    proxy_send_timeout 600;
    proxy_read_timeout 600;
    gzip on;
    gzip_comp_level 2;
    gzip_min_length  1000;
    gzip_proxied     expired no-cache no-store private auth;
    gzip_types       text/plain application/x-javascript text/xml text/css application/xml;

These are our primary test/debugging trials.

**1.    Testing with a small video (of size 273 Kb)**
a.  Nginx log- clean, nothing related to operations
b.  Tomcat log-
Start- CoursesServiceImpl - addCourse - Used Memory:73
add course 703
image file not null org.springframework.web.multipart.support.StandardMultipartHttpServletRequest$StandardMultipartFile@15476ca3
image save to s3 bucket 
image folder name images
buckets3 lmsdev-cloudfront/images
image s3 bucket for call 
imageUrl https://lmsdev-cloudfront.s3.amazonaws.com/images/703_4_istockphoto-1097843576-612x612.jpg
video file not null org.springframework.web.multipart.support.StandardMultipartHttpServletRequest$StandardMultipartFile@13419d27
video save to s3 bucket 
 video folder name videos
input Stream java.io.ByteArrayInputStream@4da82ff
buckets3 lmsdev-cloudfront/videos
video s3 bucket for call
video url https://lmsdev-cloudfront.s3.amazonaws.com/videos/703_4_giphy360p.mp4
Before Finally - CoursesServiceImpl - addCourse - Used Memory:126
After Finally- CoursesServiceImpl - addCourse - Used Memory:49
c.  S3 bucket
 
[S3 bucket][1]
      [1]: https://i.sstatic.net/T7daW.png



3.  Testing with video 2 mb (fractionally less)
    a.  Progress bar keeps running about 5 minutes, then 
    b.  Nginx logs- 
    2022/09/10 16:15:34 [error] 3698306#3698306: *24091 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 223.235.101.169, server: login.pathnam.education, request: "POST /courses/courses/course HTTP/1.1", upstream: "http://10.10.10.10:8080//courses/course", host: "login.pathnam.education", referrer: "https://login.pathnam.education/"
    c.  Tomcat logs-
    Start- CoursesServiceImpl - addCourse - Used Memory:79
    add course 704
    image file not null org.springframework.web.multipart.support.StandardMultipartHttpServletRequest$StandardMultipartFile@352d57e3
    image save to s3 bucket 
    image folder name images
    buckets3 lmsdev-cloudfront/images
    image s3 bucket for call 
    imageUrl https://lmsdev-cloudfront.s3.amazonaws.com/images/704_4_m_Maldives_dest_landscape_l_755_1487.webp
    video file not null org.springframework.web.multipart.support.StandardMultipartHttpServletRequest$StandardMultipartFile@45bdb178
    video save to s3 bucket 
     video folder name videos
    input Stream java.io.ByteArrayInputStream@3a85dab9
    And after few minutes 
    com.amazonaws.SdkClientException: Unable to execute HTTP request: Connection timed out (Write failed)
    d.  S3 Bucket – No entry 

Now tried to upload the same video from our test server, and it was instantly uploaded to S3 bucket.

Reading all posts with similar problems,mostly are related to php.ini configurations and thus not related to us.


Solution

  • I have solved the issue now, MTU set in LXC container was set differently than what was configured in virtual switch. Proxmox does not give to set MTU while creating LXC container (and you expect bridge MTU to be used) and you can miss that.

    Go to conf file of container; in my case it is 100

    nano /etc/pve/lxc/100.conf
    

    find and edit this line

    net0: name=eno1,bridge=vmbr4002,firewall=1,hwaddr=0A:14:98:05:8C:C5,ip=192.168.0.2/24,type=veth
    

    to add mtu value, as per switch in towards the last:

    name=eno1,bridge=vmbr4002,firewall=1,hwaddr=0A:14:98:05:8C:C5,ip=192.168.0.2/24,type=veth,mtu=1400 (my value at vswitch)
    

    Reboot the container for a permanent change.

    And all worked like a charm for me. Hope it helps someone who also uses Proxmox interface to create the containers and thus missed this to configure via CLI (a suggested enhancement to Proxmox)