I am using PostgresSQL 15 hosted by AWS Aurora serverless v2. I have a table includes 200 millions rows and I need to create an index on that table. The problem is that creating an index would take a few hours which impact general operations on that table in production environment. I don't want to change the db instance capacity for this.
I am looking for a way to create the index slowly. I don't mind if it takes a few days or a few weeks. Like I make a script which run in the background to put data to the new index chunk by chunk slowly. Is it possible to do it?
You can create index concurrently
and you can reduce max_parallel_maintenance_workers
for the session that does this to give it no more than a single CPU and maintenance_work_mem
to limit its memory consumption. These will make the process slower overall, but there's not much you can do beyond that.
If you shared the DDL and elaborated on what you mean by impact and how much of it you can afford, it would be easier to come up with something. For example, if that's a partitioned table, creating an index on it triggers the same on all partitions - in that scenario you could instead "manually" create individual partition indexes one by one. There's a mention in the create index
doc:
ONLY
Indicates not to recurse creating indexes on partitions, if the table is partitioned. The default is to recurse.
Concurrent builds for indexes on partitioned tables are currently not supported. However, you may concurrently build the index on each partition individually and then finally create the partitioned index non-concurrently in order to reduce the time where writes to the partitioned table will be locked out. In this case, building the partitioned index is a metadata only operation.