
CreateML Recommender Training Error: Item IDs in the recommender model must be numbered 0, 1, ..., num_items - 1

I'm using CreateML to generate a Recommender model using an implicit dataset of the format: User ID, Item ID. The data is loaded into CreateML as a CSV with about 400k rows.

When attempting to 'Train' the model, I receive the following error:

Training Error: Item IDs in the recommender model must be numbered 0, 1, ..., num_items - 1

My dataset is in the following format:


I've tried modifying both Item ID and User ID to enumerated IDs, but I still receive the training error. Example:


I receive this error both within the CreateML UI and when using CreateML within a Swift playground. I've also tried removing duplicates and verified that the maximum ID for each column is (num_items - 1).

I've searched for documentation on what the exact requirement is for the set of IDs with no luck.

Thank you in advance for any helping clarifying this error message.


  • I was able to discuss this issue with Apple's CoreML developers during WWDC2020. They described this as a known bug which will be fixed with the upcoming OS (Big Sur). The work-around for this bug is:

    In the CSV dataset, create records for a single user which interacts with ALL items, and create records for a single item interacted with by ALL users.

    Using pandas in python, I essentially implemented the following:

    # Find the unique item ids
    item_ids = ratings_df.item_id.unique()
    # Find the unique user ids
    user_ids = ratings_df.user_id.unique()
    # Create a 'dummy user' which interacts with all items
    mock_item_interactions_df = pd.DataFrame({'item_id': item_ids, 'user_id': 'mock-user'})
    ratings_with_mocks_df = ratings_df.append(mock_item_interactions_df)
    # Create a 'dummy item' which interacts with all users
    mock_item_interactions_df = pd.DataFrame({'item_id': 'mock-item', 'user_id': user_ids})
    ratings_with_mocks_df = ratings_with_mocks_df.append(mock_item_interactions_df)
    # Export the CSV
    ratings_with_mocks_df.to_csv('data/ratings-w-mocks.csv', quoting=csv.QUOTE_NONNUMERIC, index=True)

    Using this CSV, I successfully generated a CoreML model using CreateML.