So I have an indexer in azure where I use a skillset to try and split content into pages, then generate vector embeddings for the index:
{
"@odata.context": "https://redacted/$metadata#skillsets/$entity",
"@odata.etag": "\"something\"",
"name": "something-skillset",
"description": "",
"skills": [
{
"@odata.type": "#Microsoft.Skills.Text.SplitSkill",
"name": "Text split skill",
"description": "Splits text into pages small enough to vectorize",
"context": "/document",
"defaultLanguageCode": "en",
"textSplitMode": "pages",
"maximumPageLength": 2000,
"pageOverlapLength": 500,
"maximumPagesToTake": 0,
"inputs": [
{
"name": "text",
"source": "/document/content"
}
],
"outputs": [
{
"name": "textItems",
"targetName": "/document/pages"
}
]
},
{
"@odata.type": "#Microsoft.Skills.Text.AzureOpenAIEmbeddingSkill",
"name": "Create vector embedding for pages",
"description": "",
"context": "/document",
"resourceUri": "https://something.openai.azure.com",
"apiKey": "<redacted>",
"deploymentId": "text-embedding-ada-002",
"inputs": [
{
"name": "text",
"source": "/document/pages/*"
}
],
"outputs": [
{
"name": "embedding",
"targetName": "contentVector"
}
],
"authIdentity": null
}
],
"cognitiveServices": {
"@odata.type": "#Microsoft.Azure.Search.DefaultCognitiveServices",
"description": null
},
"knowledgeStore": null,
"indexProjections": null,
"encryptionKey": null
}
But when my indexer runs the skillset on a document I get the following error:
Required skill input was not of the expected type 'String'. Name: 'text', Source: '$(/document/pages/*)'.
Expression language parsing issues:
Cannot iterate over non-array '/document/pages'.
I cannot for the life of me see what I have done wrong..
You provided incorrect inputs
and context
for the OpenAI embedding skillset and text split skill.
Change the context
from "/document"
to "/document/pages/*"
in the OpenAI embedding skillset.
And
Change the targetName
from "/document/pages"
to "pages"
.
This will resolve your error.
Essentially, you need to provide an array input to the embedding skill, and it will return an array output. So, you must set the context with an array field.