pythongoogle-cloud-platformnamed-entity-recognitiongoogle-cloud-automlgoogle-natural-language

Error with ' '.join() parsing txt for named entity recognition in NLP google API


I'm having a rough time in trying to construct a dataset for Named Entity Recognition in Google NLP API, via this script provided by Google input_helper_v2.py

The problem comes with the function _DownloadGcsFile, as it throws this error:

gsutil_cp_cmd = ' '.join(['gsutil', 'cp', gcs_file, local_filename])
TypeError: sequence item 2: expected str instance, bytes found

I've tried to put b' '.join(['gsutil', 'cp', gcs_file, local_filename]), but it yields to similar problems.

In searching for information, I noticed that it could be the script being developed in python 2.7 what is causing this.

I'll appreciate any help, as I'm a complete beginner. Thank you so much.


Solution

  • Well it means that gcs_file has type bytes. So you need to make it a string (str) type. For example:

    gsutil_cp_cmd = ' '.join(['gsutil', 'cp', gcs_file.decode('utf-8'), local_filename])