azureazure-aksazure-form-recognizerpdf-extraction

Azure Form Intelligence Connected Container Setup


We have a requirement for pdf parsing and planning to use azure form intelligence . Since our client has sensitive information we don't want to send our data to Azure instead we will be using Form intelligence connected containers and will be deploying in our aks setup. We are planning to train and use a custom model

ref:https://learn.microsoft.com/en-us/azure/ai-services/document-intelligence/containers/configuration?view=doc-intel-2.1.0

I wanted to know if I create a custom model and do the model training using the SDK is the model getting created inside my container.?If so if the container gets corrupted my model will also be corrupted.

Azure form intelligence studio


Solution

  • When using Azure Form Recognizer together with its connected containers, The training of custom models occurs in the cloud using Azure Form Recognizer. You would send your training data to Azure, and Azure Form Recognizer would create a custom model. After training, you'll receive a model ID.

    The container does not inherently have the model within it. Instead, when you run the container, you'll will provide configuration details, one of which is the model ID you received from the training step. When the container starts, it fetches the model from Azure and uses it locally. The model is stored in a local cache directory that you specify, so it doesn't have to fetch it every time the container starts.

    If your container gets corrupted or if you need to restart it, the model will still exist in this cache directory, and you won't lose it. However, if this cache directory is deleted or corrupted, then you would lose the locally cached model. It's good practice to have a reliable storage solution for this cache directory. You provide a local cache directory using the eula=accept and model-cache= parameters when starting the container.

    Training data does get sent to Azure during the training process, when you're using connected containers, the inference (or prediction) happens entirely on-premises. The container doesn't send the data you're analyzing back to Azure. It processes the data locally using the cached model.