Problem Background We have a react-based (version 16.14.0) PWA that is hosted on Microsoft Azure app service plans (premium tier).
We've recently seen an increase in a sporadic issue where we are seeing 500 errors when hitting some backend endpoints. Initial thoughts is it's a backend problem, however, I can't see how it is and need some new theories to try out :|
Randomly, when connecting to the app, we notice network errors, from one (or more) of the three possible backends the UI talks to will be showing as a 500 error. Whilst in this "state", ALL POST errors will fail to this particular backend.
It is ONLY affects POST endpoints. GET endpoints successfully continue to work to these backends (so ruling out DNS issues). ALL POST endpoints return a 500 (there was one exception POST request - but we concluded this was because it didn't have a payload!). The OPTIONS (preflight) requests for these POST requests successfully return a 204, but it's the actual request that gets a 500.
In our test environments, we are only hosted on one backend instance on the app service plan (which hasn't changed during the failing tests), so it's not a load balancing issue with a dodgy node in the pool.
Azure app service monitoring tools are limited... but I cannot see any activity to suggest these calls are actually ever making it to the backend. There's nothing in application insights, nothing in the failed request logs.
It affects all our backends. The UI and C# .NET Framework 4.8.1 backend is hosted on one app service plan, and we see these failures sporadically here. The UI also fails against two other app services that are hosted in .NET 8 (that run on the same app service plan).
Browser error received is: Failed to load resource: net::ERR_FAILED
I have two questions... Firstly, why does it only affect POST? As I'm hoping understanding this may then help identify the root cause of the problem. Secondly, do you think it could be a front-end caused 500? I've seen evidence that some things within react (e.g. react-router) can return 500s. I always thought a 500 response would HAVE to come from the backend. But as of now, I'm not that sure, so seeking some clarification.
I would raise a support ticket with Microsoft, but after raising many tickets for far simpler problems in the past, (e.g. when their services go down), I've realised this is a complete farce and waste of time which I should be using to identify the problem.
Many thanks in advance for any questions, suggestions and answers.
What we've tried:
What we are yet to try, but is on the list:
What can "fix" the error? Restarting the browser Wait X minutes, which seems to be anywhere from 5 to 15 minutes (I'm still trying to identify how long X is). Use a different browser session (e.g. open chrome in incognito mode)
This was a problem with Azure.
Current workaround is to change your app service to use http 1.1.
The App Service product team are currently investigating the problem.
The following post seems to be being updated by them (more so than my support ticket which is still stuck with our CSP, CDW)