arraysgoogle-cloud-platformdata-structuresgoogle-bigqueryarray-agg

Bigquery - Repeated field inside of a repeated record


I have a denormalised table and I want to simplify it using repeated fields inside of BigQuery. To illustrate what I am trying to do, I want to go from:

|CustomerNumber|InvoiceNumber|InvoiceLineItem|
|--------------|-------------|---------------|
|78278278      |8765         |VV190          |
|78278278      |8765         |VV191          |
|78278278      |9321         |VV198          |
|78278278      |9321         |VV199          |

To:

|CustomerNumber|InvoiceNumber [REPEATED]|InvoiceLineItem [REPEATED]|
|--------------|------------------------|--------------------------|
|78278278      |8765                    |VV190                     |
|              |                        |VV191                     |
|              |------------------------|--------------------------|
|              |9321                    |VV198                     |
|              |                        |VV199                     |

I am able to create this type of schema in BigQuery but unable to write the SQL Query to go from my denormalised data to my desired table schema.

[
  {
    "name": "CustNumber",
    "type": "STRING"
  },
  {
    "fields": [
      {
        "name": "InvoiceNumber",
        "type": "STRING"
      },
      {
        "mode": "REPEATED",
        "name": "InvoiceLineItem",
        "type": "STRING"
      }
    ],
    "mode": "REPEATED",
    "name": "OrderInfo",
    "type": "RECORD"
  }
]

Screenshot of the schema as seen in the Bigquery Console:

Any help would be greatly appreciated. Thanks :D

Additional ressources - A Sample to get started playing:

WITH
  DATA AS (
  SELECT
    "78278278" AS CustomerNumber,
    "8765" AS InvoiceNumber,
    "VV190" AS InvoiceLineItem
  UNION ALL
  SELECT
    "78278278" AS CustomerNumber,
    "8765" AS InvoiceNumber,
    "VV191" AS InvoiceLineItem
  UNION ALL
  SELECT
    "78278278" AS CustomerNumber,
    "9321" AS InvoiceNumber,
    "VV198" AS InvoiceLineItem
  UNION ALL
  SELECT
    "78278278" AS CustomerNumber,
    "9321" AS InvoiceNumber,
    "VV199" AS InvoiceLineItem )

Solution

  • Try this

    WITH
      DATA AS (
      SELECT
        "78278278" AS CustomerNumber,
        "8765" AS InvoiceNumber,
        "VV190" AS InvoiceLineItem
      UNION ALL
      SELECT
        "78278278" AS CustomerNumber,
        "8765" AS InvoiceNumber,
        "VV191" AS InvoiceLineItem
      UNION ALL
      SELECT
        "78278278" AS CustomerNumber,
        "9321" AS InvoiceNumber,
        "VV198" AS InvoiceLineItem
      UNION ALL
      SELECT
        "78278278" AS CustomerNumber,
        "9321" AS InvoiceNumber,
        "VV199" AS InvoiceLineItem )
    
        
    SELECT CustomerNumber, ARRAY_AGG(STRUCT (InvoiceNumber,InvoiceLineItem)) OrderInfo FROM (
        SELECT CustomerNumber, InvoiceNumber, ARRAY_AGG(InvoiceLineItem) InvoiceLineItem FROM DATA
        GROUP BY CustomerNumber, InvoiceNumber    
    )
    GROUP BY CustomerNumber