I have a dataset with people's characteristics and I need to predict their breakfast here's an example of df.
And I am training cat boost algorithm for that.
Is it possible in my case to predict not only one kind of breakfast, but also an additional one?
By additional I mean the second most appealing type of breakfast for a person.
#I started with this:
df_train, df_test = train_test_split(df, test_size=0.15, random_state=42)
df_train, df_valid = train_test_split(df_train, test_size=0.15, random_state=42)
features_train = df_train.drop(\['breakfast'\], axis=1)
target_train = df_train\['breakfast'\]
features_valid = df_valid.drop(\['breakfast'\], axis=1)
target_valid = df_valid\['breakfast'\]
features_test = df_test.drop(\['breakfast'\], axis=1)
target_test = df_test\['breakfast'\]
model_cat = CatBoostClassifier(random_state=42)
model_cat.fit(features_train, target_train)
valid_predictions_tree = model_cat.predict(features_valid)
#But this is supposed to train for a single categorical variable output, however I need not one but two best results.
Using predict_proba instead will return the probability for every class of your target:
valid_predictions_tree = model_cat.predict_proba(features_valid)
To get clean predictions for an input dt
you can do this:
proba = pd.DataFrame(model_cat.predict_proba(dt), columns=model_cat.classes_)
Output example:
Class1 Class2 Class3
0.2 0.5 0.3
0.7 0.2 0.1
The total for each line is 1 (100%).