I want to find "name_org1" records that are duplicated - no unique records. I am operating in Google Bigquery.
To do this I decided to use the ROW_NUMBER() windows function to rank all records, with duplicates being ranked in sequential order. Everything seems to work, except I cannot filter out unique records. In other words, I want my data set to only include ranks that are sequential (1,2, or 1,2,3, or 1,2,3,4, etc).
I do not want any one-to-ones (ie, unique records on each row). Can't seem to figure out how to exclude these. Tried using ChatGPT but I'm not satisfied.
Here's my query thus far, which yields the duplicates I'm looking for, but also includes the unique records:
SELECT
partner, name_org1,
ROW_NUMBER() OVER(PARTITION BY name_org1) as duplicates
FROM `gcp-gfs-datalake-core-prd.sap_p05__saphanadb__views_current.but000`
I feel I've exhausted most of the options I know how to use, including using CTEs or aggregates, but I'm at a loss right now.
Appreciate the help.
Looks like you need COUNT(*)
window function
SELECT
b.partner,
b.name_org1
FROM (
SELECT *,
COUNT(*) OVER (PARTITION BY name_org1) as count
FROM `gcp-gfs-datalake-core-prd.sap_p05__saphanadb__views_current.but000` b
) b
WHERE b.count > 1;