I'm new to Google Cloud Platform and I'm trying to create a Feature Store to fill with values from a csv file from Google Cloud Storage. The aim is to do that from a local notebook in Python. I'm basically following the code here, making the appropriate changes since I'm working with the credit card public dataset. The error that raises when I run the code is the following:
GoogleAPICallError: None Unexpected state: Long-running operation had neither response nor error set.
and it happens during the ingestion of the data from the csv file.
Here it is the code I'm working on:
import os
from datetime import datetime
from google.cloud import bigquery
from google.cloud import aiplatform
from google.cloud.aiplatform_v1.types import feature as feature_pb2
from google.cloud.aiplatform_v1.types import featurestore as featurestore_pb2
from google.cloud.aiplatform_v1.types import \
featurestore_service as featurestore_service_pb2
from google.cloud.aiplatform_v1.types import entity_type as entity_type_pb2
from google.cloud.aiplatform_v1.types import FeatureSelector, IdMatcher
credential_path = r"C:\Users\...\.json"
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = credential_path
## Constants
PROJECT_ID = "my-project-ID"
REGION = "us-central1"
API_ENDPOINT = "us-central1-aiplatform.googleapis.com"
INPUT_CSV_FILE = "my-input-file.csv"
FEATURESTORE_ID = "fraud_detection"
## Output dataset
DESTINATION_DATA_SET = "fraud_predictions"
TIMESTAMP = datetime.now().strftime("%Y%m%d%H%M%S")
DESTINATION_DATA_SET = "{prefix}_{timestamp}".format(
prefix=DESTINATION_DATA_SET, timestamp=TIMESTAMP
)
## Output table. Make sure that the table does NOT already exist;
## the BatchReadFeatureValues API cannot overwrite an existing table
DESTINATION_TABLE_NAME = "training_data"
DESTINATION_PATTERN = "bq://{project}.{dataset}.{table}"
DESTINATION_TABLE_URI = DESTINATION_PATTERN.format(
project=PROJECT_ID, dataset=DESTINATION_DATA_SET,
table=DESTINATION_TABLE_NAME
)
## Create dataset
client = bigquery.Client(project=PROJECT_ID)
dataset_id = "{}.{}".format(client.project, DESTINATION_DATA_SET)
dataset = bigquery.Dataset(dataset_id)
dataset.location = REGION
dataset = client.create_dataset(dataset)
print("Created dataset {}.{}".format(client.project, dataset.dataset_id))
## Create client for CRUD and data_client for reading feature values.
client = aiplatform.gapic.FeaturestoreServiceClient(
client_options={"api_endpoint": API_ENDPOINT})
data_client = aiplatform.gapic.FeaturestoreOnlineServingServiceClient(
client_options={"api_endpoint": API_ENDPOINT})
BASE_RESOURCE_PATH = client.common_location_path(PROJECT_ID, REGION)
## Create featurestore (only the first time)
create_lro = client.create_featurestore(
featurestore_service_pb2.CreateFeaturestoreRequest(
parent=BASE_RESOURCE_PATH,
featurestore_id=FEATURESTORE_ID,
featurestore=featurestore_pb2.Featurestore(
online_serving_config=featurestore_pb2.Featurestore.OnlineServingConfig(
fixed_node_count=1
),
),
)
)
## Wait for LRO to finish and get the LRO result.
print(create_lro.result())
client.get_featurestore(
name=client.featurestore_path(PROJECT_ID, REGION, FEATURESTORE_ID)
)
## Create credit card entity type (only the first time)
cc_entity_type_lro = client.create_entity_type(
featurestore_service_pb2.CreateEntityTypeRequest(
parent=client.featurestore_path(PROJECT_ID, REGION, FEATURESTORE_ID),
entity_type_id="creditcards",
entity_type=entity_type_pb2.EntityType(
description="Credit card entity",
),
)
)
## Create fraud entity type (only the first time)
fraud_entity_type_lro = client.create_entity_type(
featurestore_service_pb2.CreateEntityTypeRequest(
parent=client.featurestore_path(PROJECT_ID, REGION, FEATURESTORE_ID),
entity_type_id="frauds",
entity_type=entity_type_pb2.EntityType(
description="Fraud entity",
),
)
)
## Create features for credit card type (only the first time)
client.batch_create_features(
parent=client.entity_type_path(PROJECT_ID, REGION, FEATURESTORE_ID, "creditcards"),
requests=[
featurestore_service_pb2.CreateFeatureRequest(
feature=feature_pb2.Feature(
value_type=feature_pb2.Feature.ValueType.DOUBLE, description="",
),
feature_id="v1",
),
featurestore_service_pb2.CreateFeatureRequest(
feature=feature_pb2.Feature(
value_type=feature_pb2.Feature.ValueType.DOUBLE, description="",
),
feature_id="v2",
),
featurestore_service_pb2.CreateFeatureRequest(
feature=feature_pb2.Feature(
value_type=feature_pb2.Feature.ValueType.DOUBLE, description="",
),
feature_id="v3",
),
featurestore_service_pb2.CreateFeatureRequest(
feature=feature_pb2.Feature(
value_type=feature_pb2.Feature.ValueType.DOUBLE, description="",
),
feature_id="v4",
),
featurestore_service_pb2.CreateFeatureRequest(
feature=feature_pb2.Feature(
value_type=feature_pb2.Feature.ValueType.DOUBLE, description="",
),
feature_id="v5",
),
featurestore_service_pb2.CreateFeatureRequest(
feature=feature_pb2.Feature(
value_type=feature_pb2.Feature.ValueType.DOUBLE, description="",
),
feature_id="v6",
),
featurestore_service_pb2.CreateFeatureRequest(
feature=feature_pb2.Feature(
value_type=feature_pb2.Feature.ValueType.DOUBLE, description="",
),
feature_id="v7",
),
featurestore_service_pb2.CreateFeatureRequest(
feature=feature_pb2.Feature(
value_type=feature_pb2.Feature.ValueType.DOUBLE, description="",
),
feature_id="v8",
),
featurestore_service_pb2.CreateFeatureRequest(
feature=feature_pb2.Feature(
value_type=feature_pb2.Feature.ValueType.DOUBLE, description="",
),
feature_id="v9",
),
featurestore_service_pb2.CreateFeatureRequest(
feature=feature_pb2.Feature(
value_type=feature_pb2.Feature.ValueType.DOUBLE, description="",
),
feature_id="v10",
),
featurestore_service_pb2.CreateFeatureRequest(
feature=feature_pb2.Feature(
value_type=feature_pb2.Feature.ValueType.DOUBLE, description="",
),
feature_id="v11",
),
featurestore_service_pb2.CreateFeatureRequest(
feature=feature_pb2.Feature(
value_type=feature_pb2.Feature.ValueType.DOUBLE, description="",
),
feature_id="v12",
),
featurestore_service_pb2.CreateFeatureRequest(
feature=feature_pb2.Feature(
value_type=feature_pb2.Feature.ValueType.DOUBLE, description="",
),
feature_id="v13",
),
featurestore_service_pb2.CreateFeatureRequest(
feature=feature_pb2.Feature(
value_type=feature_pb2.Feature.ValueType.DOUBLE, description="",
),
feature_id="v14",
),
featurestore_service_pb2.CreateFeatureRequest(
feature=feature_pb2.Feature(
value_type=feature_pb2.Feature.ValueType.DOUBLE, description="",
),
feature_id="v15",
),
featurestore_service_pb2.CreateFeatureRequest(
feature=feature_pb2.Feature(
value_type=feature_pb2.Feature.ValueType.DOUBLE, description="",
),
feature_id="v16",
),
featurestore_service_pb2.CreateFeatureRequest(
feature=feature_pb2.Feature(
value_type=feature_pb2.Feature.ValueType.DOUBLE, description="",
),
feature_id="v17",
),
featurestore_service_pb2.CreateFeatureRequest(
feature=feature_pb2.Feature(
value_type=feature_pb2.Feature.ValueType.DOUBLE, description="",
),
feature_id="v18",
),
featurestore_service_pb2.CreateFeatureRequest(
feature=feature_pb2.Feature(
value_type=feature_pb2.Feature.ValueType.DOUBLE, description="",
),
feature_id="v19",
),
featurestore_service_pb2.CreateFeatureRequest(
feature=feature_pb2.Feature(
value_type=feature_pb2.Feature.ValueType.DOUBLE, description="",
),
feature_id="v20",
),
featurestore_service_pb2.CreateFeatureRequest(
feature=feature_pb2.Feature(
value_type=feature_pb2.Feature.ValueType.DOUBLE, description="",
),
feature_id="v21",
),
featurestore_service_pb2.CreateFeatureRequest(
feature=feature_pb2.Feature(
value_type=feature_pb2.Feature.ValueType.DOUBLE, description="",
),
feature_id="v22",
),
featurestore_service_pb2.CreateFeatureRequest(
feature=feature_pb2.Feature(
value_type=feature_pb2.Feature.ValueType.DOUBLE, description="",
),
feature_id="v23",
),
featurestore_service_pb2.CreateFeatureRequest(
feature=feature_pb2.Feature(
value_type=feature_pb2.Feature.ValueType.DOUBLE, description="",
),
feature_id="v24",
),
featurestore_service_pb2.CreateFeatureRequest(
feature=feature_pb2.Feature(
value_type=feature_pb2.Feature.ValueType.DOUBLE, description="",
),
feature_id="v25",
),
featurestore_service_pb2.CreateFeatureRequest(
feature=feature_pb2.Feature(
value_type=feature_pb2.Feature.ValueType.DOUBLE, description="",
),
feature_id="v26",
),
featurestore_service_pb2.CreateFeatureRequest(
feature=feature_pb2.Feature(
value_type=feature_pb2.Feature.ValueType.DOUBLE, description="",
),
feature_id="v27",
),
featurestore_service_pb2.CreateFeatureRequest(
feature=feature_pb2.Feature(
value_type=feature_pb2.Feature.ValueType.DOUBLE, description="",
),
feature_id="v28",
),
featurestore_service_pb2.CreateFeatureRequest(
feature=feature_pb2.Feature(
value_type=feature_pb2.Feature.ValueType.DOUBLE, description="",
),
feature_id="amount",
),
],
).result()
## Create features for fraud type (only the first time)
client.batch_create_features(
parent=client.entity_type_path(PROJECT_ID, REGION, FEATURESTORE_ID, "frauds"),
requests=[
featurestore_service_pb2.CreateFeatureRequest(
feature=feature_pb2.Feature(
value_type=feature_pb2.Feature.ValueType.DOUBLE, description="",
),
feature_id="class",
),
],
).result()
## Import features values for credit cards
import_cc_request = aiplatform.gapic.ImportFeatureValuesRequest(
entity_type=client.entity_type_path(
PROJECT_ID, REGION, FEATURESTORE_ID, "creditcards"),
csv_source=aiplatform.gapic.CsvSource(gcs_source=aiplatform.gapic.GcsSource(
uris=["gs://fraud-detection-19102021/dataset/cc_details_train.csv"])),
entity_id_field="cc_id",
feature_specs=[
aiplatform.gapic.ImportFeatureValuesRequest.FeatureSpec(id="v1"),
aiplatform.gapic.ImportFeatureValuesRequest.FeatureSpec(id="v2"),
aiplatform.gapic.ImportFeatureValuesRequest.FeatureSpec(id="v3"),
aiplatform.gapic.ImportFeatureValuesRequest.FeatureSpec(id="v4"),
aiplatform.gapic.ImportFeatureValuesRequest.FeatureSpec(id="v5"),
aiplatform.gapic.ImportFeatureValuesRequest.FeatureSpec(id="v6"),
aiplatform.gapic.ImportFeatureValuesRequest.FeatureSpec(id="v7"),
aiplatform.gapic.ImportFeatureValuesRequest.FeatureSpec(id="v8"),
aiplatform.gapic.ImportFeatureValuesRequest.FeatureSpec(id="v9"),
aiplatform.gapic.ImportFeatureValuesRequest.FeatureSpec(id="v10"),
aiplatform.gapic.ImportFeatureValuesRequest.FeatureSpec(id="v11"),
aiplatform.gapic.ImportFeatureValuesRequest.FeatureSpec(id="v12"),
aiplatform.gapic.ImportFeatureValuesRequest.FeatureSpec(id="v13"),
aiplatform.gapic.ImportFeatureValuesRequest.FeatureSpec(id="v14"),
aiplatform.gapic.ImportFeatureValuesRequest.FeatureSpec(id="v15"),
aiplatform.gapic.ImportFeatureValuesRequest.FeatureSpec(id="v16"),
aiplatform.gapic.ImportFeatureValuesRequest.FeatureSpec(id="v17"),
aiplatform.gapic.ImportFeatureValuesRequest.FeatureSpec(id="v18"),
aiplatform.gapic.ImportFeatureValuesRequest.FeatureSpec(id="v19"),
aiplatform.gapic.ImportFeatureValuesRequest.FeatureSpec(id="v20"),
aiplatform.gapic.ImportFeatureValuesRequest.FeatureSpec(id="v21"),
aiplatform.gapic.ImportFeatureValuesRequest.FeatureSpec(id="v22"),
aiplatform.gapic.ImportFeatureValuesRequest.FeatureSpec(id="v23"),
aiplatform.gapic.ImportFeatureValuesRequest.FeatureSpec(id="v24"),
aiplatform.gapic.ImportFeatureValuesRequest.FeatureSpec(id="v25"),
aiplatform.gapic.ImportFeatureValuesRequest.FeatureSpec(id="v26"),
aiplatform.gapic.ImportFeatureValuesRequest.FeatureSpec(id="v27"),
aiplatform.gapic.ImportFeatureValuesRequest.FeatureSpec(id="v28"),
aiplatform.gapic.ImportFeatureValuesRequest.FeatureSpec(id="amount"),
],
feature_time_field='time',
worker_count=1,
)
## Start to import
ingestion_lro = client.import_feature_values(import_cc_request)
## Polls for the LRO status and prints when the LRO has completed
ingestion_lro.result()
## Import features values for frauds
import_fraud_request = aiplatform.gapic.ImportFeatureValuesRequest(
entity_type=client.entity_type_path(
PROJECT_ID, REGION, FEATURESTORE_ID, "frauds"),
csv_source=aiplatform.gapic.CsvSource(gcs_source=aiplatform.gapic.GcsSource(
uris=["gs://fraud-detection-19102021/dataset/data_fraud_train.csv"])),
entity_id_field="fraud_id",
feature_specs=[
aiplatform.gapic.ImportFeatureValuesRequest.FeatureSpec(id="class"),
],
feature_time_field='time',
worker_count=1,
)
## Start to import
ingestion_lro = client.import_feature_values(import_fraud_request)
## Polls for the LRO status and prints when the LRO has completed
ingestion_lro.result()
When I check the Ingestion Jobs
from the Feature
section of Google Cloud Console I see that the job has finished but no values are added to my features.
Any advice it is really precious.
Thank you all.
EDIT 1
In the image below there is an example of the first row of the csv file I used as input (cc_details_train.csv
). All the unseen features are similar, the feature class
can assume 0 or 1 values.
The injection job lasts about 5 minutes to import (ideally) 3000 rows, but it ends without error and without importing any value.
VERTEX AI recomendations when using CSV to ImportValues / using ImportFeatureValuesRequest
Its possible that when using this feature you might end not able to import any data at all. You must pay attention to the time field you are using as it must be in compliance with google time formats.
Note: I my test i found that if i do not properly set up the time field as google recommended date format it will just not upload any feature value at all.
test.csv
cc_id,time,v1,v2,v3,v4,v5,v6,v7,v8,v9,v10,v11,v12,v13,v14,v15,v16,v17,v18,v19,v20,v21,v22,v23,v24,v25,v26,v27,v28,amount
100,2021-04-15T08:28:14Z,-1.359807,-0.072781,2.534897,1.872351,2.596267,0.465238,0.923123,0.347986,0.987354,1.234657,2.128645,1.958237,0.876123,-1.712984,-0.876436,1.74699,-1.645877,-0.936121,1.456327,0.087623,1.900872,2.876234,1.874123,0.923451,0.123432,0.000012,1.212121,0.010203,1000
output:
imported_entity_count: 1
imported_feature_value_count: 29
About optimization and working with features
You can check the official documentation here to see the min and max amount of records recommended for processing. As a piece of advice you should only use the actual working features to run and the recommended amount of values for it.
See your running ingested job
Either if you use VertexUI or code to generated the ingested job. You can track its run by going into the UI to this path:
VertexAI > Features > View Ingested Jobs