I split my dataset in X_train
, Y_train
, X_test
and Y_test
, and then I used the symbolicRegressor...
I've already convert the string values from Dataframe in float values.
But by applying the symbolicRegressor
I get this error:
ValueError: could not convert string to float: 'd'
Where 'd' is a value from Y.
Since all my values in Y_train
and Y_test
are alphabetic character because they are the "labels", I can not understand why the symbolicRegressor
tries to get a float number ..
Any idea?
According to the https://gplearn.readthedocs.io/en/stable/index.html
- "Symbolic regression is a machine learning technique that aims to identify an underlying mathematical expression that best describes a relationship". Pay attention to mathematical
. I am not good at the topic of the question and gplearn
's description does not clearly define area of applicability / restrictions.
However, according to the source code https://gplearn.readthedocs.io/en/stable/_modules/gplearn/genetic.html
method fit()
of BaseSymbolic
class contains line X, y = check_X_y(X, y, y_numeric=True)
where check_X_y()
is sklearn.utils.validation.check_X_y()
. Argument y_numeris
means: "Whether to ensure that y has a numeric type. If dtype of y is object, it is converted to float64. Should only be used for regression algorithms".
So y
values must be numeric.