google-bigqueryduplicatessql-updatepartition-by

How to update a column value with 1 when the table has duplicated records in Big query


I am trying to update a column in a table when they are duplicated rows found. Initially, I tried to partition by all columns but later found that FLOAT dtypes are not allowed.

update table tabale_name
set column_to_update =1 
where exists(SELECT *,COUNT(*) OVER (PARTITION BY column1,column2,column3) AS duplicate_count FROM `forward-cacao-351005.dq_temp.dq_sample_data`) WHERE duplicate_count > 1)

Error:-Partitioning by expressions of type FLOAT64 is not allowed at [1:120]

After this I tried to find the duplicates using google solution which is responding back with the correct records but when updating the table it updates the whole table but not where only duplicate records are present.

update `table_name` 
SET column_to_update = 1
WHERE EXCEPT(select 1 from (SELECT u.*
FROM `table_name`  u
WHERE TRUE
QUALIFY ROW_NUMBER() OVER (PARTITION BY TO_JSON_STRING(u)) > 1)
)

The above query updates all the records but not the duplicated records.

What is the best way to update the column of the same table which are having the duplicated records?


Solution

  • For your requirement,you can consider the below query as an example:

    UPDATE `my_table`
    SET column_to_update = 1
    WHERE id IN (
      SELECT id
      FROM `my_table` GROUP BY id
      HAVING COUNT(*) > 1
    );
    

    Result:

    image

    To solve the FLOAT64 is not allowed error, you can consider using a cast function. For more details you can refer to this stack thread.