I have a table consisting of ~200 million rows and 6 columns, for which I only want to keep rows that are unique across all columns.
I am currently doing a CREATE TABLE temp AS SELECT DISTINCT * FROM plates;
This has run for 10 hours and is still running so I must be doing something wrong. I read about indexing and grouping by rowid's, but I do not understand how this would speed up the process since it is still evaluating every row and every column. I am mainly looking for insight here, since I am probably missing something.
You could try:
create table temp (
a,b,c,d,e,f,
primary key (a,b,c,d,e,f) on conflict ignore
);
insert into temp (a,b,c,d,e,f)
select a,b,c,d,e,f from plates;
where a
,b
,c
,d
,e
,f
are the relevant columns.