How can I find duplicate data entries in one specific table in clickhouse DB?
I am actually investigating on a merge tree table and actually threw optimize statements at my table but that didn't do the trick. The duplicate entries still persist.
Preferred would be to have a universal strategy without referencing individual column names.
I only want to see the duplicate entries, since I am working on very large tables.
The straight forward way would be to run this query.
SELECT
*,
count() AS cnt
FROM myDB.myTable
GROUP BY *
HAVING cnt > 1
ORDER BY date ASC
If that query gets to big you can run it in pieces.
SELECT
*,
count() AS cnt
FROM myDB.myTable
WHERE (date >= '2020-08-01') AND (date < '2020-09-01')
GROUP BY *
HAVING cnt > 1
ORDER BY date ASC