azure-synapseazure-synapse-analyticsazure-synapse-pipeline

Azure Synapse Pipeline Stuck / Hung in-Progress state; how to debug? LocalVM runtime performance


This appears to be a common problem but a complete mystery for how to resolve.

This happens when I try to process many hundreds of files. My pipeline has 1588 items. It works great for an hour or so and then it gets down to a few items in progress and nothing happens. It just hangs like this.

Most recent failure of the 6 errors output:

{
    "errorCode": "2200",
    "message": "'Type=System.OutOfMemoryException,Message=Exception of type 'System.OutOfMemoryException' was thrown.,Source=Microsoft.DataTransfer.ClientLibrary,'",
    "failureType": "UserError",
    "target": "Copy EIP CSV Extract JB",
    "details": []
}

3 items stuck in progress

If I process only 25 files in a trigger the pipeline runs fine. This hanging appears to happen only when I want it to run against hundreds of files at a time.

Since there is no clear error, I don't even understand how to debug this. It is very frustrating.

This is the output for each of those items:

For each: {
    "ItemsCount": "515"
}

IfSizeLargerThan25MB_FileExtension_txt_gz
{}

Copy EIP CSV Extract JBV2
{
"dataRead": 51933184,
"dataWritten": 0,
"filesRead": 1,
"filesWritten": 0,
"sourcePeakConnections": 1,
"sinkPeakConnections": 1,
"rowsRead": 10982475,
"rowsCopied": 10982475,
"copyDuration": 3562,
"throughput": 14.604,
"logFilePath": "copyactivity-logs/Copy EIP CSV Extract JB/....4ebb/",
"errors": [],
"effectiveIntegrationRuntime": "FC-xxxxxxx",
"usedParallelCopies": 1,
"executionDetails": [
    {
        "source": {
            "type": "FileServer"
        },
        "sink": {
            "type": "AzureBlobFS",
            "region": "West US 3"
        },
        "status": "Canceled",
        "start": "9/20/2023, 5:10:19 PM",
        "duration": 3562,
        "usedParallelCopies": 1,
        "profile": {
            "queue": {
                "status": "Completed",
                "duration": 6
            },
            "transfer": {
                "status": "InProgress",
                "duration": 3556,
                "details": {
                    "listingSource": {
                        "type": "FileServer",
                        "workingDuration": 0
                    },
                    "readingFromSource": {
                        "type": "FileServer",
                        "workingDuration": 22
                    },
                    "writingToSink": {
                        "type": "AzureBlobFS",
                        "workingDuration": 3
                    }
                }
            }
        },
        "detailedDurations": {
            "queuingDuration": 6,
            "transferDuration": 3556
        }
    }
],
"dataConsistencyVerification": {
    "VerificationResult": "Verified"
},
"durationInQueue": {
    "integrationRuntimeQueue": 6
}

}

Any ideas?

Thanks,

John


Solution

  • We increased the memory on 6 cores from 16MB to 24MBs on the local VM. I then reran the pipeline on 2x as many files and it completed successfully with no errors, or stalls. Pretty sure the local VM resources were the problem.

    Local VM Resources during hang, with out of memory pipeline errors: enter image description here

    Local VM Resources after increasing memory and running pipeline on 2x the files: enter image description here