google-cloud-platformgoogle-cloud-storagegoogle-cloud-automlgoogle-natural-language

How to prepare CSV file for AutoML entity extraction from GCP?


I have created Jsonl files and formats specified by google. I uploaded the files to the Cloud Storage.

I prepared a CSV file, first column has path to Jsonl file(gs://*example/file.jsonl), second column has 'TRAIN' or 'VALIDATE' or 'TEST'.

I got an error saying 'Cannot find the referenced file: TRAIN in request.'

How to prepare CSV file?


Solution

  • Souds like you have the column order backwards. The order of the columns should be "ML Use" first, then GCS URI second. See the example CSV file from the Quickstart:

    https://cloud.google.com/natural-language/automl/entity-analysis/docs/quickstart

    gs://cloud-ml-data/NL-entity/dataset.csv

    https://console.cloud.google.com/storage/browser/cloud-ml-data/NL-entity/?_ga=2.132412110.-1530629862.1558449111

    $ cat Downloads/NL-entity_dataset.csv 
    TRAIN,gs://cloud-ml-data/NL-entity/train.jsonl
    TEST,gs://cloud-ml-data/NL-entity/test.jsonl
    VALIDATION,gs://cloud-ml-data/NL-entity/validation.jsonl