google-cloud-platformgoogle-bigquerybq

How can I import CSV data with newline characters in its values into a GCloud Big Query table?


My bq load command and error

$ bq load --replace --source_format=CSV --autodetect temp.BQTable ./source.csv
Upload complete.
Waiting on bqjob_123 ... (0s) Current status: DONE
BigQuery error in load operation: Error processing job 'my-project-id-123:bqjob_123': Error
while reading data, error message: Error detected while parsing row starting at position: 333. Error: Missing close
double quote (") character.
Failure details:
- Error while reading data, error message: CSV processing encountered
too many errors, giving up. Rows: 0; errors: 1; max bad: 0; error
percent: 0

Excel and my node.js module for Csv serialization can process data with values with newlines just fine.

Papaparse, my npm module for serializing, uses '\r\n' as the default newline to separate rows so there shouldn't be any ambiguity there.

Here's some sample csv data, ie, source.csv

id,name
1,"this value
includes a newline"

bq and Cloud SDK versions

$ gcloud components update


Your current Cloud SDK version is: 289.0.0
You will be upgraded to version: 337.0.0

┌─────────────────────────────────────────────────────────────────────────────┐
│                      These components will be updated.                      │
├─────────────────────────────────────────────────────┬────────────┬──────────┤
│                         Name                        │  Version   │   Size   │
├─────────────────────────────────────────────────────┼────────────┼──────────┤
│ BigQuery Command Line Tool                          │     2.0.67 │  < 1 MiB │
│ BigQuery Command Line Tool (Platform Specific)      │     2.0.65 │  < 1 MiB │
│ Cloud SDK Core Libraries                            │ 2021.04.16 │ 18.0 MiB │
│ Cloud SDK Core Libraries (Platform Specific)        │ 2021.03.12 │  < 1 MiB │
│ Cloud Storage Command Line Tool                     │       4.61 │  3.9 MiB │
│ Cloud Storage Command Line Tool (Platform Specific) │       4.59 │  < 1 MiB │
│ gcloud cli dependencies                             │ 2021.04.16 │ 10.8 MiB │
│ gcloud cli dependencies                             │ 2021.04.16 │  < 1 MiB │
└─────────────────────────────────────────────────────┴────────────┴──────────┘

I was on version 289 because they patched newer version to not allow you to import a file (CSV, JSON, etc) of size 100 MB or bigger into BQ using bq load :(


Solution

  • Add the flag below to your command:

    --allow_quoted_newlines=true
    

    Not sure if this will work with \r\n, it definetly work for me for \n encoded newlines.

    Hope it helps.