out-of-memoryazure-functionsazure-blob-storageclam

Azure function host dies with OutOfMemoryException, without triggering


Quick version: Why does my function host kill it self after 5 minutes without doing anything?

I have an Azure function which uses nClam to scan blob files for viruses. It seems to work just fine, but suddenly it will kill it self even before triggering on any blob! It will just shutdown after 5 minutes with an OutOfMemoryException:

[18/9/2019 10:33:33] Host initialized (405ms)
[18/9/2019 10:33:33] Host started (812ms)
[18/9/2019 10:33:33] Job host started
Hosting environment: Production
Content root path: D:\git\TopoAPI\Antivirus\bin\Debug\netcoreapp2.1
Now listening on: http://0.0.0.0:7071
Application started. Press Ctrl+C to shut down.
[18/9/2019 10:33:38] Host lock lease acquired by instance ID '000000000000000000000000C913FBA0'.
[18/9/2019 10:38:46] An unhandled exception has occurred. Host is shutting down.
[18/9/2019 10:38:46] Microsoft.WindowsAzure.Storage: Exception of type 'System.OutOfMemoryException' was thrown. System.Private.CoreLib: Exception of type 'System.OutOfMemoryException' was thrown.
[18/9/2019 10:38:46] Stopping host...
[18/9/2019 10:38:46] Stopping JobHost
[18/9/2019 10:38:46] Job host stopped
[18/9/2019 10:38:46] Host shutdown completed.
Application is shutting down...

Below is my scanning blob triggered function:

[FunctionName("scanImports")]
        public static async Task Scan([BlobTrigger("imports/{newBlobName}", Connection = "BlobConnectionstring")]CloudBlockBlob newBlob, string newBlobName, ILogger log, ExecutionContext context)
        {
            var config = new ConfigurationBuilder().SetBasePath(context.FunctionAppDirectory).AddJsonFile("local.settings.json", optional: true, reloadOnChange: true).AddEnvironmentVariables().Build();

            var clamClient = new ClamClient(config["ContainerAddress"], int.Parse(config["ContainerPort"]));
            var blobFileStream = await newBlob.OpenReadAsync();

            using (var memoryStream = new MemoryStream())
            {
                await blobFileStream.CopyToAsync(memoryStream);
                var result = await clamClient.SendAndScanFileAsync(memoryStream.ToArray());
                bool isClean = result.InfectedFiles == null || result.InfectedFiles.Count == 0;

                // Check if newBlob is infected. If infected, move to quarantineBlob
                if (!isClean)
                {
                    CloudStorageAccount storageAccount = CloudStorageAccount.Parse(config["blobConnectionstring"]);
                    CloudBlobClient client;
                    CloudBlobContainer container;
                    CloudBlockBlob quarantineBlob;

                    client = storageAccount.CreateCloudBlobClient();
                    container = client.GetContainerReference(config["QuarantineBlobName"]);

                    await container.CreateIfNotExistsAsync();

                    quarantineBlob = container.GetBlockBlobReference(newBlobName);
                    quarantineBlob.Properties.ContentType = newBlob.Properties.ContentType;

                    await quarantineBlob.UploadFromStreamAsync(memoryStream);
                    await newBlob.DeleteAsync();
                }
            }
        }

Update 1: The host dies after exactly 5 minutes with a OutOfMemoryException. I have tried to extend the function timeout, that made no difference. During that time the process will constantly use 5-8% cpu and before it dies the process will be using over 1500 MB memory.

Update 2: If I remove all code from the function, leaving only a single log.info() statement, the host still kills it self after 5 minutes with an OutOfMemoryException


Solution

  • The cause of my problem was that the blob I was trying to trigger on had 100.000+ files in it. Microsoft defines that as High scale.

    So my code actually had no problems. It is just that if I try to trigger on a "high scale" blob, then my code will do nothing but just allocate memory and die.