google-cloud-platformgoogle-cloud-vertex-aigoogle-cloud-aigoogle-cloud-aiplatform

Format issue when calling Vertex AI Custom Job Endpoint


I developed a custom training job in sklearn 0.23 in Vertex AI and successfully deployed to an endpoint. However, when I call the endpoint, I get the following error:

raise exceptions.from_grpc_error(exc) from exc
google.api_core.exceptions.FailedPrecondition: 400 "Prediction failed: Exception during sklearn prediction: Expected 2D array, got 1D array instead:\narray=['instances'].\nReshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample."

The endpoint tells me that the correct format is:

{
  "instances": [
    { "instance_key_1": "value", ... }, ...
  ],
  "parameters": { "parameter_key_1": "value", ... }, ...
}

I have the following code, from a dataframe df, from where I am taking 5 examples and 71 columns:

x=np.array(df.iloc[0:5,:-3].T)

instances_list = {"instances":[{coluna: valor for coluna, valor in zip(list(df.columns[0:-3]), list(df.iloc[0,0:-3]))}]}

####instances = [json_format.ParseDict(s, Value()) for s in instances_list]

results = endpoint.predict(instances=instances_list)

My instances_list is formatted as follows:

{'instances': [{'ID_CONTRIBUINTE': '21327662000215', 'TOTAL_E12': '354032.54', 'TOTAL_PRODUTO_E12': '352693.82', 'TOTAL_INTERESTADUAIS_E12': '282.0', 'TOTAL_INTERNAS_E12': '353750.54'}]}

But it doesn't work. Sometimes I get the error of Unable to coerce value and sometimes the endpoint is expecting a 2D array.

I also followed the prediction format at: https://codelabs.developers.google.com/codelabs/vertex-ai-custom-code-training#7 , in this case, the code would be:

instances_list = {"instances":[valor for valor in [list(i) for i in np.array(df.iloc[0:5,0:-3])]]}

But it returns the same error.

Looks like we have conflicting guidelines. GCP console tells me that the payload format is key-value pair:

instance_dict={ "instance_key_1": "value", ...}

Codelabs tells me to submit an array:

{
    "instances": [
      ["male", 29.8811345124283, 26.0, 1, "S", "New York, NY", 0, 0], 
      ["female", 48.0, 39.6, 1, "C", "London / Paris", 0, 1]]
}

Any ideas on how to overcome this issue ?


Solution

  • I solved the problem. After preprocessing the data with ColumnTransformer, as in the task.py file, part of the package, I created a list and successfully submitted to the endpoint.

    preprocessor = ColumnTransformer(
            transformers=[
                ('bin', OrdinalEncoder(), BINARY_FEATURES),
                ('num', StandardScaler(), NUMERIC_FEATURES),
                ('cat', OneHotEncoder(handle_unknown='ignore'), CATEGORICAL_FEATURES)], n_jobs=-1)
    
    x=preprocessor.fit_transform(df)
    
    
    instances_list = [list(y) for y in x[0:5]]
    
    results = endpoint.predict(instances=instances_list)
    

    Output:

    Prediction(predictions=[0.0, 0.0, 0.0, 0.0, 0.0], deployed_model_id='123456789', explanations=None)
    

    So, the correct format for prediction with a custom training with sklearn is:

    instances_list=[[1.0, 29.881134, 26.0, 1.0, 44.0, 88.0, 0.0, 0.0], 
          [0.0, 48.0, 39.6, 1,.0 22.0, 57.0, 0.0, 1.0]]