Sometime overnight, my service began throwing 504 errors on longer running (30+ second) requests despite no recent changes in architecture.
If I hit the Cloud Run-generated URL directly, I can successfully execute 30+ second requests. If I instead hit the public URL (and thus the load balancer), I get timeouts on longer requests.
Looking at Google Cloud release notes, there was a change to Load Balancing, but nothing related to timeouts: https://cloud.google.com/release-notes
The current setup has been working flawlessly for over a year.
Modifying the backend timeout is disabled for serverless NEGs, see below:
This seems like a bug with GCP load balancing introduced during the last update, as the default timeout should be 60 minutes, not 30 seconds, as per the documentation:
Since the issue was related to our prod environment, we have created new load balancer, this time we picked global (classic) LB as an option, got new IP address, and swapped old one at our dns provider. After that everything works. Will probably delete previous LB. I am aware this is not a fix to the problem but more of a workaround, but hey, we got production working and exporting data like before.