I have two different structured json files being piped in from a snowpipe. The only difference is that instead of a nested dict it has many nested arrays. I am trying to figure out how to transform structure 1 into one finalized table. I've successfully transformed structure 2 into a table and included the code below.
I know I need to be making use of lateral flatten but have not been successful.
**Structure 1: Nested Arrays (Need help on)**
This json lives within a table and in column **JSONTEXT**
[
{
"ID": "xxx-xxxx-xxxx xxx-xxx",
"caseTypeID": "xx-xxxx-xxxx-xxxxx",
"content": {
"AccountID": "xx-xxxxx-xxxx-xxxx xxxx-xxxxx",
"AccountName": "XXXX",
"Address": {
"pxObjClass": "Data-Address-Postal"
},
"Addresses": [],
"AllKickoffsComplete": "true",
"BillingContactList": [],
"ClientCurrency": "USD",
"ClientID": "XXXXXX",
"ClientNSID": "XXXXXXXX-00",
"ClientName": "XXXXX XXXX Inc.",
"CompanyPhoneNumber": "XXX-XXX-XXXX",
"CrmSearchOrg": "XXXX",
"EEList": [
{
"AccountID": "xxx-xxxxx-xxxx-xxxxx xxxx-xxxxx",
"AccountName": "XXXX",
"AllowanceList": [
{
"AllowanceAmount": "327",
"AllowanceName": "Car Allowance",
"pxObjClass": "xxxxx-xxxxx-xxxxx"
]
Structure 2: Nested Dict This json lives within a table and in column JSONTEXT
[
{
"OppID": "xxxx-xxxxx",
"pxObjClass": "xx-xxxxx-xxxx-xxxxxx",
"pxPages": {
"EEList": {
"Country": "xxx",
"CountryName": "xxx",
"Currency": "xxx",
"EstimatedICPCost": "xxxxxxxxxxx",
"ICPCurrency": "xxxxx",
"ICPID": "xxxxxxxxx.",
"ICPNSID": "xxxx-xx",
"ICPName": "xxx xx xx.",
"LocalMonthlySalary": "xxxxxx",
"MinFee": "xxxx",
"MonthlyGrossCost": "xxxxx",
"NewOrRepeatCustomer": "xxxxx",
"OppCloseDate": "xxx-xxx-xx",
"OppID": "xxx-xxxx",
"OpportunityName": "xxx - xxx xxx - xxx - xxxx",
"ReferralSource": "xxxxxx",
"pxObjClass": "Index-xx-xxxx-xxxx-xxxxxx",
"pxSubscript": "EEList"
}
},
"pyID": "xxxxxx",
"pzInsKey": "xxxx-xxxx-xxxx xxxxx-xxx"
},
]
Here is my code for the second structure that works.
create or replace table xxxx
as select
value:ID::varchar as ID,
value:caseTypeID::varchar as caseTypeID,
value:content:AccountID::varchar as AccountID,
value:content:AccountName::varchar as AccountName,
value:content:AllKickoffsComplete::boolean as AllKickoffsComplete,
value:content:ClientCurrency::varchar as ClientCurrency,
value:content:ClientID::varchar as ClientID,
value:content:ClientNSID::varchar as ClientNSID,
value:content:ClientName::varchar as ClientName,
value:content:CompanyAddressCountryName::varchar as CompanyAddressCountryName,
value:content:CompanyPhoneNumber::varchar as CompanyPhoneNumber,
value:content:CreateNew::boolean as CreateNew,
value:content:CrmSearchOrg::varchar as CrmSearchOrg,
value:content:EEList:AccountID::varchar as EE_AccountID,
value:content:EEList:AccountName::varchar as EE_AccountName
from new_raw_json,
lateral flatten (input =>jsontext);
Here is code I've tried it only works when you put jsontext[Nth].
select
value:ID::varchar as ID,
value:EEListID::varchar as EEListID,
value:caseTypeID::varchar as caseTypeID
from new_raw_json,
lateral flatten (input => jsontext[0]:content:EEList);
Appreciate any help!
You can chain multiple lateral views using FLATTEN to continue exploding into nested structures (arrays within arrays).
An explicitly defined approach may appear this way (only some columns are projected here, to illustrate levels):
SELECT
outer_object.value:caseTypeID AS caseTypeID,
outer_object.value:content.AccountID AS parentAccountID,
eelist_object.value:AccountID AS eeListAccountID,
allowance_object.value:AllowanceName
FROM
new_raw_json,
LATERAL FLATTEN (input => jsontext) outer_object,
LATERAL FLATTEN (input => outer_object.value:content.EEList) eelist_object,
LATERAL FLATTEN (input => eelist_object.value:AllowanceList) allowance_object;
Note that this only explodes one identified multi-value path (List -> EEList -> AllowanceList
). It is unclear from the question if all the paths have to be exploded (such as List -> EEList -> Addresses AND AllowanceList
) or if it is acceptable to store some of them as VARIANT
(or other complex) type in the final result.
For example, if there is a need to to duplicate AllowanceList
values for every listed address in Addresses
under EEList
, this could be achieved by performing a JOIN
from two exploding query results (one that chains List -> Addresses
and another that chains List -> EEList -> AllowanceList
).