I'm trying to denormalize data in Dataprep so that I can use it in BigQuery.
More specifically, I'd like to turn entries in an account_profile table linked to my account table with with foreign key 'account_id' into an array in my account table. (Account_profile stores contact methods... bad name, I know.)
In dataprep, I've
The problem is that when I try to unnest that column in BigQuery, or do any other array-like operation in BigQuery, I get an error like this: "Values referenced in UNNEST must be arrays."
My data looks good. For example, here is a row.
[{"profile_identifier":"ttcuongem+29@gmail.com","verification_code":"abc789","enabled":true,"id1":2818},{"profile_identifier":"xyz123,"enabled":false,"id1":2874}]
I can't find a way to make BigQuery see this as an array, nor can I find a way to make Dataprep create this kind of data as an array rather than a string. The only solutions people have posted are very specific hacks that wouldn't apply to this generic case.
I feel that I'm following denormalization best practices and am surprised that this gap exists in the Google ELT toolchain. What am I missing?
Below is for BigQuery Standard SQL
You can use recently introduced JSON_EXTRACT_ARRAY function for this as in example below
#standardSQL
WITH `project.dataset.table` AS (
SELECT '''[
{"profile_identifier":"ttcuongem+29@gmail.com","verification_code":"abc789","enabled":true,"id1":2818},
{"profile_identifier":"xyz123","enabled":false,"id1":2874}
]''' string_col
)
SELECT JSON_EXTRACT_ARRAY(string_col) AS arr_col
FROM `project.dataset.table`
with output
Row arr_col
1 {"profile_identifier":"ttcuongem+29@gmail.com","verification_code":"abc789","enabled":true,"id1":2818}
{"profile_identifier":"xyz123","enabled":false,"id1":2874}