I have a series of .txt
files that I need to upload to Google Cloud in order to create some tables out of them in BigQuery. The files are tables whose separator is a ';'.
For some reason it seems like BigQuery has issues at identifying columns (automatically or not) when the files are in this shape, even if specifying ';' as the separator.
I generated a sample table, and find-and-replace the ';' with some ',' and saved the file as .csv
. Now BigQuery has no issue at creating the apt table.
Question: Should I find-and-replace all the all the separators in all the tables? or am I missing something?
If yes how can I implement on a OS 10 the sep prompt command? (files are large and I have issues at timely substitute character also with UltraEdit)
Best!
To address this issue - you can upload such file(s) as having just one column and then after you have data in BigQuery - parse it using SQL into whatever schema you need. I think this is the most generic and easy way to deal with this. Usually SPLIT(col, ';')
would be used as in below quite simplified example or in more complex cases REGEXP_EXTRACT_ALL()
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'aa,a;bbb;ccc' col
)
SELECT
SPLIT(col, ';')[SAFE_OFFSET(0)] col1,
SPLIT(col, ';')[SAFE_OFFSET(1)] col2,
SPLIT(col, ';')[SAFE_OFFSET(2)] col3
FROM `project.dataset.table`
with result
Row col1 col2 col3
1 aa,a bbb ccc