azure-cosmosdbazure-data-factoryevent-sourcingazure-cosmosdb-changefeed

What is the most efficient way to copy a CosmosDB collection and retain the order of items by physical partition?


I've tried many different combinations using Azure Data Factory to create a clone of a CosmosDB collection that maintains the order of items written to a partition, but unless I specify a batch write size of 1, it does not keep the order. Even triggering from the Change Feed of the source in a mapping data flow does not preserve order. We have written a simple tool that copies a record at a time, but obviously, that is slow.

We are using Cosmos as an event store, and the change feed processor feeds our projectors - it all works really well, but we would like to copy the events out to a different environment to test changes. This requires the original write order to be preserved.

Thanks in advance.


Solution

  • The change feed processor does read from each physical partition in _ts order.

    Certainly I've been able to use this to successfully copy very large collections (> 1TB) in a matter of a few hours.

    For this I've used a function app scaled across multiple instances, ensured the leases collection has sufficient max RU configured to not become a bottle neck and when provisioning the target scaled up the RU sufficient to create the desired number of physical partitions up front rather than having the partitions split during the import.

    I have always used bulk insert though so within each batch delivered by the change feed processor I guess the _ts could become disordered. This has never been important for me.

    The most efficient way of copying the collection to a new one and preserving the _ts order would certainly be to restore a backup.

    It also has the benefit that you do not have to write any code and provision any resources to do it. If you are not already using the continuous backup model you should consider switching to it as this allows the restore to be self service and to a specified point in time.