I developed a custom training job in sklearn 0.23 in Vertex AI and successfully deployed to an endpoint. However, when I call the endpoint, I get the following error:
raise exceptions.from_grpc_error(exc) from exc
google.api_core.exceptions.FailedPrecondition: 400 "Prediction failed: Exception during sklearn prediction: Expected 2D array, got 1D array instead:\narray=['instances'].\nReshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample."
The endpoint tells me that the correct format is:
{
"instances": [
{ "instance_key_1": "value", ... }, ...
],
"parameters": { "parameter_key_1": "value", ... }, ...
}
I have the following code, from a dataframe df
, from where I am taking 5 examples and 71 columns:
x=np.array(df.iloc[0:5,:-3].T)
instances_list = {"instances":[{coluna: valor for coluna, valor in zip(list(df.columns[0:-3]), list(df.iloc[0,0:-3]))}]}
####instances = [json_format.ParseDict(s, Value()) for s in instances_list]
results = endpoint.predict(instances=instances_list)
My instances_list
is formatted as follows:
{'instances': [{'ID_CONTRIBUINTE': '21327662000215', 'TOTAL_E12': '354032.54', 'TOTAL_PRODUTO_E12': '352693.82', 'TOTAL_INTERESTADUAIS_E12': '282.0', 'TOTAL_INTERNAS_E12': '353750.54'}]}
But it doesn't work. Sometimes I get the error of Unable to coerce value
and sometimes the endpoint is expecting a 2D array
.
I also followed the prediction format at: https://codelabs.developers.google.com/codelabs/vertex-ai-custom-code-training#7 , in this case, the code would be:
instances_list = {"instances":[valor for valor in [list(i) for i in np.array(df.iloc[0:5,0:-3])]]}
But it returns the same error.
Looks like we have conflicting guidelines. GCP console tells me that the payload format is key-value pair:
instance_dict={ "instance_key_1": "value", ...}
Codelabs tells me to submit an array:
{
"instances": [
["male", 29.8811345124283, 26.0, 1, "S", "New York, NY", 0, 0],
["female", 48.0, 39.6, 1, "C", "London / Paris", 0, 1]]
}
Any ideas on how to overcome this issue ?
I solved the problem. After preprocessing the data with ColumnTransformer
, as in the task.py
file, part of the package, I created a list and successfully submitted to the endpoint.
preprocessor = ColumnTransformer(
transformers=[
('bin', OrdinalEncoder(), BINARY_FEATURES),
('num', StandardScaler(), NUMERIC_FEATURES),
('cat', OneHotEncoder(handle_unknown='ignore'), CATEGORICAL_FEATURES)], n_jobs=-1)
x=preprocessor.fit_transform(df)
instances_list = [list(y) for y in x[0:5]]
results = endpoint.predict(instances=instances_list)
Output:
Prediction(predictions=[0.0, 0.0, 0.0, 0.0, 0.0], deployed_model_id='123456789', explanations=None)
So, the correct format for prediction with a custom training with sklearn
is:
instances_list=[[1.0, 29.881134, 26.0, 1.0, 44.0, 88.0, 0.0, 0.0],
[0.0, 48.0, 39.6, 1,.0 22.0, 57.0, 0.0, 1.0]]