I'm uploading a lot of DocX and PDF files into blob storage to be used in Azure cognitive search. I'm using it to experiment with some AI capabilities I already, and it works well but I would like to try the filterable freshness. I'm not sure how the metadata for these PDF files (e.g., 'author', 'date', 'title') can be added through a skill. Any advice would be appreciated. Thanks
{
"@odata.context": ... ,
"@odata.etag": ... ,
"name": "freshness",
"description": "Skillset to chunk documents and generate embeddings",
"skills": [
{
...
},
{
"@odata.type": "#Microsoft.Skills.Util.ShaperSkill",
"name": "#3",
"description": "Extracts metadata from the document",
"context": "/document",
"inputs": [
{
"name": "metadata_creation_date",
"source": "/document/metadata_creation_date"
}
],
"outputs": [
{
"name": "output",
"targetName": "creationDate"
}
]
}
],
"cognitiveServices": null,
"knowledgeStore": null,
"indexProjections": {
"selectors": [
{
"targetIndexName": "freshness",
"parentKeyFieldName": "parent_id",
"sourceContext": "/document/pages/*",
"mappings": [
{
"name": "creationDate",
"source": "/document/creationDate",
"sourceContext": null,
"inputs": []
}
]
}
],
"parameters": {
"projectionMode": "skipIndexingParentDocuments"
}
},
"encryptionKey": null
}```
If you already having the index then you can create new field of type Edm.DateTimeOffset
After creating, map the fields indexer in fieldMappings
"fieldMappings": [
{
"sourceFieldName": "metadata_storage_path",
"targetFieldName": "metadata_storage_path",
"mappingFunction": {
"name": "base64Encode",
"parameters": null
}
},
{
"sourceFieldName":"metadata_storage_last_modified",
"targetFieldName":"last_modified"
}
]
or
while importing data in the Customize target index
you can make it filterable.
Check the Filterable
as shown in the image.
Output: