In a previous post, see here: How update a table in Big Query where the name of fields to update are values in another table
I had two table and the following problem:
Table 1:
+-------+------------+---------+
| ID | field_name | value |
+-------+------------+---------+
| 1 | usd | 10.08 |
| 1 | gross_amt | 52.0 |
| 1 | jpy | 30.05 |
| 2 | usd | 50.0 |
| 2 | eur | 50.0 |
| 3 | real_amt | 210.43 |
| 3 | total | 320 |
| 4 | jpy | 23.45 |
| 4 | name | john |
| 4 | city | utah |
+-------+------------+---------+
Table 2:
+-----+-------+-----------+----------+---------+------+-------+-------+-------+-----------+----------+-------+-----+----------+
| ID | name | last_name | date1 | counrty | city | usd | eur | jpy | gross_amt | real_amt | total | ... | field200 |
+-----+-------+-----------+----------+---------+------+-------+-------+-------+-----------+----------+-------+-----+----------+
| 1 | jane | doe | 19900108 | usa | LA | 9.08 | 0.00 | 29.05 | 50.0 | 52.0 | 900.0 | ... | value200 |
| 2 | lane | smith | 19900108 | usa | LA | 40.8 | 40.0 | 0.00 | 100.0 | 70.0 | 290.0 | ... | value200 |
| 3 | mike | hoffa | 19900108 | usa | SF | 5.05 | 0.00 | 0.00 | 10.0 | 25.0 | 100.0 | ... | value200 |
| 4 | paul | doe | 19900108 | usa | NY | 1.00 | 0.00 | 29.05 | 45.0 | 55.0 | 110.0 | ... | value200 |
+-----+-------+-----------+----------+---------+------+-------+-------+-------+-----------+----------+-------+-----+----------+
Need to update the values of the fields in the table 2, which are in table 1 in column field_name
, with the values of table 1 column value
, both IDs are the same in both tables, beside that, the datatype of column value
in table 1 are string, but the data type of the columns to update in the table 2 are diferent, especially the numbers(numeric, int64, float64)
The tables above are an example, table 2 of the real problem has 200 fields and in table 1 for an ID there can be up to 40 value modifications for thousands of records to be modified daily
BUT NOW I HAVE A NEW PROBLEM
In the new table1 (Table 3):
Table 3:
+-------+------------+---------+-----------------------+------------------+
| ID | field_name | value | value_replaced_table2 | diff |
+-------+------------+---------+-----------------------+------------------+
| 1 | usd | 10.08 | 9.08 | abs(10.08-9.08) |
| 1 | gross_amt | 52.0 | 50.0 | abs(52.0-50.0) |
| 1 | jpy | 30.05 | 29.05 | abs(30.05-29.05) |
| 2 | usd | 50.0 | 40.08 | abs(50.0-40.0) |
| 2 | eur | 50.0 | 40.0 | ...... |
| 3 | real_amt | 210.43 | 25.0 | ...... |
| 3 | total | 320 | 100.0 | ...... |
| 4 | jpy | 23.45 | 29.05 | abs(23.45-29.05) |
| 4 | name | john | paul | john |
| 4 | city | utah | NY | utah |
+-------+------------+---------+-----------------------+------------------+
I need insert into new table 1 (table 3) in the column value_replaced_table2
the value replaced in the table 2, thus storing the replaced value in table 2 above, and calculate the differences between both values, (the new value to update and the old value replaced in table 2, note that the data types in the new table 1 (table 3) are string, and those in table 2 are (numeric, int64, float64)
From now, thanks for your answers!
Using the previously created pivot1
table, you can use it before executing the final MERGE
with table2
to get the old_values
that will change.
Then, you will need to unpivot the results to get table3
. Example using your sample data
:
-- Same pivot1 table as before
EXECUTE IMMEDIATE '''
CREATE TEMP TABLE pivot1 AS
SELECT id, ''' || (
SELECT STRING_AGG(DISTINCT "MAX(IF(field_name = '" || field_name || "', CAST(value AS " || data_type || "), NULL)) AS " || field_name)
FROM `project.dataset.table1`
JOIN (
SELECT column_name, data_type
FROM `project.dataset.INFORMATION_SCHEMA.COLUMNS`
WHERE table_name = 'replica2'
) ON field_name = column_name
) || '''
FROM `project.dataset.table1`
GROUP BY id
''';
-- Table3
EXECUTE IMMEDIATE '''
CREATE OR REPLACE TABLE project.dataset.table3 AS
SELECT a.id, values.column_name as field_name, values.new_value as value, values.old_value as value_replaced_table2,
CASE
WHEN values.data_type = "STRING" THEN values.new_value
WHEN values.data_type = "INT64" THEN CAST(ABS(CAST(values.new_value AS INT64) - CAST(values.old_value AS INT64)) AS STRING)
ELSE CAST(ABS(CAST(values.new_value AS FLOAT64) - CAST(values.old_value AS FLOAT64)) AS STRING)
END as diff
FROM (
SELECT t1.id, [''' || (
SELECT STRING_AGG(DISTINCT "STRUCT('" || column_name || "' as column_name, CAST(t1." || column_name || " AS STRING) as new_value, CAST(t2." || column_name || " AS STRING) as old_value, '" || data_type || "' as data_type)")
FROM `project.dataset.table1`
JOIN (
SELECT column_name, data_type
FROM `project.dataset.INFORMATION_SCHEMA.COLUMNS`
WHERE table_name = 'replica2'
) ON field_name = column_name
) || '''] AS values
FROM `project.dataset.table2` AS t2
JOIN pivot1 AS t1
ON t2.id = t1.id ) a
CROSS JOIN UNNEST(a.values) as values
WHERE values.new_value IS NOT NULL
''';
SELECT * FROM `project.dataset.table3` ORDER BY id;
Notice that the FLOAT64 difference will get an approximate value when casted to STRING, so if you want to round up the difference, you can use a cast to NUMERIC instead of FLOAT64, e.g.:
...
ELSE CAST(ABS(CAST(values.new_value AS NUMERIC) - CAST(values.old_value AS NUMERIC)) AS STRING)
...
-- Instead of 9.20000000000000028, it will appear as 9.2