I'm training a Catboost model and using a Pool object as following:
pool = Pool(data=x_train, label=y_train, cat_features=cat_cols)
eval_set = Pool(data=x_validation, label=y_validation['Label'], cat_features=cat_cols)
model.fit(pool, early_stopping_rounds=EARLY_STOPPING_ROUNDS, eval_set=eval_set)
For the x_train
, y_train
, x_validation
and y_validation
, they are from Pandas DataFrame
type (The datasets saved as Parquet file, and I use PyArrow to read them into the dataframes).
model
is a Catboost classifier/regressor.
I'm trying to optimize for large datasets, and my questions are:
libsvm
file? like mention here https://catboost.ai/docs/concepts/python-usages-examples.html#load-the-dataset-from-a-file