I am receiving intermittent Serializable isolation violation on xid
errors from a stored proc in Redshift so I followed the guidance in the aws docs (https://docs.aws.amazon.com/redshift/latest/dg/c_serial_isolation.html) and changed the db isolation level to Snapshot Isolation with the understanding that this would fix the issue, but unfortunately I am still receiving them (albeit at a lower frequency).
Should these errors still be happening with Snapshot Isolation mode?
There is only one proc that works on the table affected so I am guessing that the order of statements is not relevant here as the sql being run will always be the same, but there could be multiple instances of the proc running at the same time which is presumably the cause.
There is no "fix" for these errors - This is an inherent risk of a MVCC database. If you are not familiar with MVCC coherency then a quick read of https://en.wikipedia.org/wiki/Multiversion_concurrency_control might help.
Let me try to simplify these errors down a bit. It is easy to miss the forest for the trees when talking about complex systems.
That's it. Redshift doesn't "know" that the changes that the other transaction is making is material to the results this transaction is making. Just that it COULD be material. Since it COULD be material then the serialization hazard exists and one transaction is aborted to prevent the possibility of indeterminant results.
Two transactions are in flight making updates and their updates are inputs to the other transaction's calculations. It's a loop - A depends on B which depends on A which depends on B ...
This is a condition created by the inputs to the database and not completely within the database's control to resolve. So the answer to your first question is 'yes' these errors can still be happening.
Being the only code that is updating the table in question is not important here. What's important is what other code is updating the source data of this procedure and if that code depends on the target table of this procedure. The error message reports the xids of the transactions in conflict and you can look up the query text of both. Doing this will give you a good idea of what 2 sets of code are interacting.
As for multiple runs of the same procedure interacting with each other - this seems possible. The procedure would need to scan the table before the table updating statement runs as executing the UPDATE/INSERT would lock the table. The table would also need to have changed between the scans made by the 2 invocations of the procedure, so there would need to be a 3rd instance that completed between these 2 scans. There would need to be some time between this scan of the table and the UPDATE/INSERT statement. Possible but I'm not sure how likely.
Again look at the code for the transactions being flagged to know what is really happening.
Most important take-away - these errors are created by inputs to the database that are circularly dependent. The full solution also lies outside of the database.