I have 2 source tables at the moment.
Table #1: sourceTableMain
|EmployeeNumber| DepartmentNumber | CostCenterNumber |
| -------------| ---------------- |------------------|
| 1 | 100 | 1001 |
| 2 | 200 | 1001 |
| 3 | 100 | 1002 |
Table #2: sourceTableEmployee
|EmployeeNumber| EmployeeFirstName | EmployeeLastName | EmployeeAddress |
| -------------| ---------------- |------------------|---------------- |
| 1 | Michael | Scott | 110 ABC Ln |
| 1 | Michael | Scott | 450 XYZ Ln |
| 2 | Dwight | Schrute | 321 PQR St |
| 3 | Jim | Halpert | 678 LMN Blvd |
I am trying to insert the combine the rows into a 3rd table named targetTableCombined which has the following schema:
FieldName | Type | Mode | |
---|---|---|---|
employeeNumber | INTEGER | NULLABLE | |
employeeDetails | (struct) | RECORD | REPEATED |
employeeFirstName | STRING | NULLABLE | |
employeeLastName | STRING | NULLABLE | |
employeeAddress | STRING | NULLABLE |
Within the target table (targetTableCombined), I am trying to make sure that for each employeeNumber, all of the First Names, Last Names and Addresses are repeated under a single struct array. For example, EmployeeNumber 1 should have only 1 row in the target table, with the first name, last name and different addresses as part of the second column (struct), each in a separate row.
I wrote an insert script to do this, but I am going wrong:
insert into `dev.try_sbx.targetTableCombined`
select
main.employeeNumber,
array(
select as struct
emp.employeeFirstName,
emp.employeeLastName,
emp.employeeAddress
)
from
`dev.try_sbx.sourceTableMain` as main
inner join `dev.try_sbx.sourceTableEmployee` as emp
on main.EmployeeNumber = emp.EmployeeNumber;
This is the result I am getting when running the query above:
| EmployeeNumber | EmployeeDetails |
| ------------- | ------------------------------ |
| 1 | [Michael, Scott, 110 ABC Ln] |
| 1 | [Michael, Scott, 450 XYZ Ln] |
| 2 | [Dwight, Schrute, 321 PQR St] |
| 3 | [Jim, Halpert, 678 LMN Blvd] |
(Sorry about not being able to share screenshots - I don't have enough rep. But to elaborate, I am expecting only 3 rows on the insert (employee 1 should have had a single array containing both addresses). I am instead, getting 4 rows after the insert.)
Where am I going wrong with my script?
It's because ARRAY()
is not an aggregation function. You should ARRAY_AGG()
along with GROUP BY
to group details for each employee into an array.
SELECT EmployeeNumber,
ARRAY_AGG((SELECT AS STRUCT EmployeeFirstName, EmployeeLastName, EmployeeAddress)) AS employeeDetails
FROM `dev.try_sbx.sourceTableEmployee`
GROUP BY 1;
More preferred way is :
SELECT EmployeeNumber,
ARRAY_AGG(STRUCT(EmployeeFirstName, EmployeeLastName, EmployeeAddress)) AS employeeDetails
FROM `dev.try_sbx.sourceTableEmployee`
GROUP BY 1;
output: