javaazure-functionsazure-blob-trigger

Instances of Azure Functions are sharing variables?


Not sure if the question makes sense, but it's what I'm observing. My Azure Function uses a BlobTrigger to process PDF files that are uploaded to a Blob Storage. Things work fine, until I upload several blobs at once, in which case, using the code below I observe the following:

EDIT: to be clear, I understand logs aren't going to be in order when multiple instances run in parallel. However, rather than getting 10 unique results for lines[19] when I upload 10 files, the majority of the results are duplicates and this issue worsens later on in my code when based on X I want to do Y, and 9 out of 10 invocations produce garbage data.

Main.class

public class main {
   @FunctionName("veninv")
       @StorageAccount("Storage")
       public void blob(
           @BlobTrigger(
                   name = "blob",
                   dataType = "binary",
                   path = "veninv/{name}") 
               byte[] content,
           @BindingName("name") String blobname,
           final ExecutionContext context
           ) {

         context.getLogger().info("BlobTrigger by: " + blobname + "(" + content.length + " bytes)");

           //Writing byte[] to a file in Azure Functions file storage
               File tempfile = new File (tempdir, blobname);
               OutputStream os = new FileOutputStream(tempfile);
               os.write(content);
               os.close();

               String[] lines  = Pdf.getLines(tempfile);
               context.getLogger().info(lines[19]);
           }
    }

Pdf.class

   public static String[] getLines(File PDF) throws Exception {
           PDDocument doc = PDDocument.load(PDF);
           PDFTextStripper pdfStripper = new PDFTextStripper();
           String text = pdfStripper.getText(doc);
           lines = text.split(System.getProperty("line.separator"));
           doc.close();
           return lines;
   }

I don't really understand what's going on here, so hoping for some assistance.


Solution

  • Yes. Azure function invocations can share variables. I'd need to see all the code to be 100% certain, but it looks like the lines object is declared as static and it could be shared across invocations. Let's try changing from a static String[] to String[] and see if the problem goes away?

    Azure functions are easy to get off the ground, it's easy to forget about the execution environment. Your functions invocations aren't as isolated as they appear. There is a parent thread calling your function, and many static variables aren't "thread safe." Static variable represents a global state so it is globally accessible. Also, it is not attached with any particular object instance. The "staticness" of the variable relates to the memory space it sits in not it’s value. So, the same variable is accessible from all class instances in which it is referenced.

    PS. You've solved the issue in your answer here by reducing concurrency, but that may come at a cost to scalability. I'd recommend load testing that. Also static variables can be useful. Many are thread-safe and you want to use them in Azure functions, such as your httpClient or sqlClient DB connections! Give number three a read, here.