I understand that whenever I create a Global Secondary Index (GSI) for a DynamoDB table it will take some time to create that GSI (depending on table size).
From what I understood, loading the items from the base table to the GSI only consumes the WCU of the GSI.
Let's assume I have a DynamoDB table with Terrabytes of data in it. if I create a GSI with 1 WCU, how long will it take for the GSI to be created (if all the items and values have to be projected)? Could it be high values such as multiple months ? (the doc states it takes around 5 minutes)
Indeed, when you add a GSI on a table with pre-existing data, a so-called backfilling process begins that reads the table's data and writes it to the GSI. There is no guarantee that this process can finish in 5 minutes. The documentation explains that the base table's RCU are not used, but the new indexes WCU are used, so if you provision too few WCU on the index, the backfilling will be slow. For example, this document, section "Adding a Global Secondary Index to a large table", says that:
The time required for building a global secondary index depends on several factors, such as the following: ... The provisioned write capacity of the index ... If you are adding a global secondary index to a very large table, it might take a long time for the creation process to complete.
... If the provisioned write throughput setting on the index is too low, the index build will take longer to complete. To shorten the time it takes to build a new global secondary index, you can increase its provisioned write capacity temporarily. As a general rule, we recommend setting the provisioned write capacity of the index to 1.5 times the write capacity of the table. This is a good setting for many use cases. However, your actual requirements might be higher or lower.
The document recommends that you look at the OnlineIndexPercentageProgress
CloudWatch metric to understand the amount of progress that the backfilling is making.
The same document also raises two more reasons why the backfilling process might be slower than you hoped: