I am trying to convert input text data into embeddings using AWS Bedrock. I need to use batch processing as I have large data.
I have tried different formats but every time the job fails with following error.
'status': 'Failed',
'message': 'Invalid JSON format encountered in file: abc.jsonl',
Formats that I have tried (abc.jsonl)
{
"recordId" : "CALL0000001",
"modelInput": {
"inputText":"this is where you place your input text"
}
}
{
"modelInput": {
"inputText":"this is where you place your input text"
}
}
As per this model's document, any one of above format should work, but It is not working. https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-titan-embed-text.html
Can someone please check and share what exactly should be the format?
As documentation said:
To prepare inputs for batch inference, create a .jsonl file in the following format:
{ "recordId" : "11 character alphanumeric string", "modelInput" : {JSON body} }
Each line contains a JSON object with a recordId field and a modelInput field containing the request body for an input you want to submit. The format of the modelInput JSON object must match the body field for the model that you use in the InvokeModel request. For more information, see Inference request parameters and response fields for foundation models.
All the problem comes from the diffirent between JSON and JSONLine, they don't have the same format, for more details read this https://jsonlines.org/