azureindexingazure-ai

Skillset error in Azure when adding vector embeddings to index


So I have an indexer in azure where I use a skillset to try and split content into pages, then generate vector embeddings for the index:

{
  "@odata.context": "https://redacted/$metadata#skillsets/$entity",
  "@odata.etag": "\"something\"",
  "name": "something-skillset",
  "description": "",
  "skills": [
    {
      "@odata.type": "#Microsoft.Skills.Text.SplitSkill",
      "name": "Text split skill",
      "description": "Splits text into pages small enough to vectorize",
      "context": "/document",
      "defaultLanguageCode": "en",
      "textSplitMode": "pages",
      "maximumPageLength": 2000,
      "pageOverlapLength": 500,
      "maximumPagesToTake": 0,
      "inputs": [
        {
          "name": "text",
          "source": "/document/content"
        }
      ],
      "outputs": [
        {
          "name": "textItems",
          "targetName": "/document/pages"
        }
      ]
    },
    {
      "@odata.type": "#Microsoft.Skills.Text.AzureOpenAIEmbeddingSkill",
      "name": "Create vector embedding for pages",
      "description": "",
      "context": "/document",
      "resourceUri": "https://something.openai.azure.com",
      "apiKey": "<redacted>",
      "deploymentId": "text-embedding-ada-002",
      "inputs": [
        {
          "name": "text",
          "source": "/document/pages/*"
        }
      ],
      "outputs": [
        {
          "name": "embedding",
          "targetName": "contentVector"
        }
      ],
      "authIdentity": null
    }
  ],
  "cognitiveServices": {
    "@odata.type": "#Microsoft.Azure.Search.DefaultCognitiveServices",
    "description": null
  },
  "knowledgeStore": null,
  "indexProjections": null,
  "encryptionKey": null
}

But when my indexer runs the skillset on a document I get the following error:

Required skill input was not of the expected type 'String'. Name: 'text', Source: '$(/document/pages/*)'.
Expression language parsing issues:
Cannot iterate over non-array '/document/pages'.

I cannot for the life of me see what I have done wrong..


Solution

  • You provided incorrect inputs and context for the OpenAI embedding skillset and text split skill.

    Change the context from "/document" to "/document/pages/*" in the OpenAI embedding skillset.

    And

    Change the targetName from "/document/pages" to "pages".

    This will resolve your error.

    Essentially, you need to provide an array input to the embedding skill, and it will return an array output. So, you must set the context with an array field.