I have created a solution version using "similar-items" recipe in Amazon Personalize and trying to test it with a batch inference job. I followed AWS documentation which states that the input should be a list of itemIds, with maximum of 500 items, and each itemId separated with a new line:
{"itemId": "105"}
{"itemId": "106"}
{"itemId": "441"}
...
Accordingly, I wrote the following code to transform my item_ids column into the described JSON format:
# convert item_id column to required JSON format with new lines entered between items
items_json = items_df['ITEM_ID'][1:200].to_json(orient='columns').replace(',','}\n{')
# write output to json file
with open('items_json.json', 'w') as f:
json.dump(items_json, f)
# write file to S3
from io import StringIO
import s3fs
#Connect to S3 default profile
s3 = boto3.client('s3')
s3.put_object(
Body=json.dumps(items_json),
Bucket='bucket',
Key='personalize/batch-recommendations-input/items_json.json'
)
Then when I run the batch inference job with that as input, it gives the following error: "User error: Input JSON is malformed."
My sample JSON input looks as follows:
"{"itemId":"12637"} {"itemId":"12931"} {"itemId":"13005"}"
and after copying it to S3 as follows (adding backslashes to it)- don't know if that's significant in anyway:
"{\"itemId\":\"12637\"}\n{\"itemId\":\"12931\"}\n{\"itemId\":\"13005\"}"
To me, my format looks quite similar to what they asked for, any clue what might be causing the error?
You just need some small changes to the use of to_json. Specifically, orient
should be records
and lines
should be True
.
Full example:
import pandas as pd
import boto3
items_df = pd.read_csv("...")
# Make sure item ID column name is "itemId"
item_ids_df = items_df.rename(columns={"ITEM_ID": "itemId"})[["itemId"]]
# Write df to file in JSON lines format
item_ids_df.to_json("job_input.json", orient="records", lines=True)
# Upload to S3
boto3.Session().resource('s3').Bucket(bucket).Object("job_input.json").upload_file("job_input.json")
Lastly, you mentioned that the maximum number of input items is 500. Actually, your input file can have up to 50M input items or a file size of 1GB.