Toy datasets are useful to share reproducible issues. I would like to easily create image datasets on Vertex AI from open-source data.
For example, Keras provides some public data sets (boston_housing, cifar10, cifar100, fashion_mnist, imdb, mnist, reuters).
How to load one of them easily in a Vertex AI image dataset ? With gcloud commands and/or Python script for example ?
Assuming you have GCP credentials to perform the following actions, a Vertex AI dataset with single-label image can be created with the following commands.
$ pip install cifar2png
$ cifar10 cifar10_png
$ BUCKET_NAME="your_bucket_name"
$ gsutil -m -q cp -r cifar10_png/test gs://${BUCKET_NAME}/cifar10_png/test
$ LOCATION="continent-zonenumber"
$ PROJECT_ID="your_project_id"
$ curl -X POST "https://${LOCATION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${LOCATION}/datasets" \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d '{"display_name": "<replace_by_your_table_name>", "metadata_schema_uri": "gs://google-cloud-aiplatform/schema/dataset/metadata/image_1.0.0.yaml"}'
$ DATASET_ID="you_dataset_id"
$ curl -X POST "https://${LOCATION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${LOCATION}/datasets/${DATASET_ID}:import" \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d '{"import_configs": [{"gcs_source": {"uris": "gs://<replace_by_your_bucket_name>/cifar10_png/test"}, "import_schema_uri" : "gs://google-cloud-aiplatform/schema/dataset/ioformat/image_classification_single_label_io_format_1.0.0.yaml"}]}'