I am working on a .NET 6 project with EF core 6.0. The project is build using Clean Architechture and CQRS pattern.
My question is more general and I am looking more for opinions rather than straight answers.
So imagine that I have a command that deletes user from the database. Let's call it DeleteUserByIdCommand
. The command takes a user id and checks if the user exists, if the user exists, it will be deleted.
At the end of the command I am calling context.SaveChanges()
so the deletion actually happens in the database. The problem comes when I add another command, let's name it DeleteUsersByIdBulkCommand
. In this command, I want for the sake not repeating myself, to use my existing DeleteUserByIdCommand
. The new command takes an array of user ids. This way, though, if I have 100 users to delete, the SaveChanges()
and the get requests will be executed 100 times, which is terrible for performance.
For me there are only two simple options, I have tried both:
DeleteUserByIdCommand
and write new code in the new code. The problem with that is if, for example, I want to extend the command with let's say deleting the user from some other database or third party API, I will have to change the code on two places.DeleteUserByIdCommand
which is called ShouldCallSaveChanges
and it is boolean. By default it is true, but when I am calling this command from the DeleteUsersByIdBulkCommand
it will be set to false, and I will just execute SaveChanges()
after the foreach.My question is if there is some strict rule for cases like this one? Is it a problem to repeat code here, or maybe there is some specific solution for this case. I would like to hear your input or some other suggestions you may have :)
This is an example of the general question, "why are my bulk data manipulation statements so absurdly slow?" The general answer with typical RDBMS servers is, "because the lion's share of the work in data manipulation occurs during the commit phase".
In pure-SQL applications, this means you should do a sequence of operations like the following to delete multiple rows efficiently:
BEGIN TRANSACTION;
DELETE FROM mytable WHERE whatever;
DELETE FROM mytable WHERE whatever;
DELETE FROM mytable WHERE whatever;
COMMIT;
When you're in a transaction the RDBMS batches up the operations, and when you COMMIT it does all the work to apply them. (This is an oversimplification, but a perfectly useful mental model for almost all applications involving less than tens of thousands of operations.)
How does this translate to EF Core? The .SaveChanges() documentation says this:
For most database providers, SaveChanges is transactional.
The code you mentioned, by invoking .SaveChanges()
after every operation, implicitly does a transaction for every operation. That makes your code easy for its callers to use. But it makes it less-than-great for bulk ops like
foreach (var id in arrayOfIds) {
ctx.DeleteUserById(id);
}
The easiest way to structure your code -- without having to build some kind of magic bulk-mode property -- is to always call .SaveChanges() explicitly, and not make it implicit.
foreach (var id in arrayOfIds) {
ctx.DeleteUserById(id);
}
ctx.SaveChanges();
This will be (unless you have thousands upon thousands of users to delete at once) just as fast as rewritten code. But you have to remember to call .SaveChanges().
When I do this stuff I implement an updater class that implements IDisposable, and in its Dispose method I call .SaveChanges.
Then I do something like this.
using (var upd = new MyUpdater()) {
upd.DeleteUserById (whatever);
upd.AddUser (whatever);
upd.ChangeUser (whatever);
}
When my instance of MyUpdater goes out of scope in the using, its .Dispose method gets called automatically. This has the added benefit of using a single database transaction for a bunch of different operations, if that's what you want to do.
The going-out-of-scope call to .Dispose() happens always, even if there's an exception somewhere.