snowflake-cloud-data-platform

Preserving the load history when re-creating pipes in Snowflake


Is there any way to preserve the load history when re-creating pipes (using CREATE OR REPLACE)?

We do a lot of automated CI/CD on Snowflake, and sometimes pipes need to get re-created. When this happens, the load history is lost. Right now, the accepted workaround is a manual process, which doesn't work very well in an automated workflow.

This makes refreshing pipes dangerous, as duplicate data could be loaded. There is also a danger of losing some notifications/files while the pipe is being re-created -- with or without the manual process, automated or not (which is unacceptable, for obvious reasons).

I wish there was a simple parameter to enable this. Something like:

CREATE OR REPLACE PIPE my_pipe
  PRESERVE_HISTORY = [ TRUE | FALSE ]
AS <copy_statement>

An alternative to this would be an option/parameter for pipes to share the load history with the table instead. This way, when the pipe is re-created (but the table isn't), the load history is preserved. If the table is dropped/truncated, then the load history for both the table and the pipe would be lost.

Another option would be the ability to modify pipes using an ALTER command instead, but currently this is very limited. This way, we wouldn't even need to re-create the pipe in the first place.

EDIT: Tried automating the manual process with a procedure, but there's a still chance of losing notifications.


Solution

  • Since preserving the load history doesn't seem possible currently, I explored a few alternatives:

    tl;dr: Here is the solution.

    Deleting/Removing/Moving the files after ingestion

    Refreshing a subset of the pipe

    Replaying the missed notifications

    Monitoring the load of every staged file