sqldatabasepostgresqldatabase-normalizationpostgresql-13

How to unnest multiple columns while including nulls


I have a table where id is a unique not null integer:

id site_names site_addresses industries feis
30 Borden Incorporated 198 Saluda St , Chester , SC , 29706-1579 , United States|198 Saluda St, Chester, SC 29706, USA|198 Saluda St Chester SC 29706-1579 United States Food and Cosmetics 12345|45678
31 Butterkrust Bakeries, Inc.|Flowers Baking Co. of Lakeland, LLC|Southern Bakeries, Inc. dba Butterkrust Bakeries null Food|Food and Cosmetics 12345
33 Church & Dwight Canada Corp. 5485 RUE FERRIER , , MONTREAL, QUEBEC Quebec , , -- , CA null null

I want to split the table into a materialized view where each row is one of the combinations possible when you split up site_names, site_addresses, industries, and feis; all may have nulls that I want.

A few rows:

id site_name site_address industry fei
30 Borden Incorporated 198 Saluda St , Chester , SC Food and Cosmetics 12345
30 Borden Incorporated 198 Saluda St , Chester , SC Food and Cosmetics 45678
30 Borden Incorporated 198 Saluda St, Chester, SC 29706, USA Food and Cosmetics 12345
30 Borden Incorporated 198 Saluda St, Chester, SC 29706, USA Food and Cosmetics 45678
...
31 Butterkrust Bakeries, Inc. null Food 12345
31 Flowers Baking Co. of Lakeland, LLC null Food 12345

The closest I got:

create materialized view site_data_split as
(
with Expanded2 as (
    select raw_site_data.id as id_fei, feis.feis
    from raw_site_data,
        unnest(string_to_array(raw_site_data.feis, '|')) feis
    ),
    Expanded3 as (
        select raw_site_data.id as id_name,
             site_names.site_names
        from raw_site_data,
            unnest(string_to_array(raw_site_data.site_names, '|')) site_names
    ),
    Expanded4 as (
        select raw_site_data.id as id_address,
            site_addresses.site_addresses
        from raw_site_data,
            unnest(string_to_array(raw_site_data.site_addresses, '|')) site_addresses
    ),
     Expanded5 as (
         select raw_site_data.id as id_industry,
                industries.industries
         from raw_site_data,
              unnest(string_to_array(raw_site_data.industries, '|')) industries
    )
select id_fei as site_id, feis as fei, site_names as site_name,
    site_addresses as site_address, industries as industry
from Expanded2, Expanded3, Expanded4, Expanded5
where Expanded2.id_fei = Expanded3.id_name
and Expanded3.id_name = Expanded4.id_address
and Expanded4.id_address = Expanded5.id_industry
);

But that does not include any of the rows with nulls.

How can I do this query while including rows with nulls in the result?


Solution

  • If you want all combinations, you can use a single query:

    select rd.id, site_name, site_address, fei
    from raw_data rd left join lateral
         regexp_split_to_table(rd.site_names, '\|') site_name
         on 1=1 left join lateral
         regexp_split_to_table(rd.site_addresses, '\|') site_address
         on 1=1 left join lateral
         regexp_split_to_table(rd.feis, '\|') fei
         on 1=1;