I have a somewhat large (~2000) set of medical images I plan to use to train a CV model (using efficentnet architecture) in my workplace. In preparation for this, I was reading up on some good practices for training medical images. I have split the dataset by patients to prevent leakages and split my data in train:test:val in the order of 60:20:20. However, I read that k-folds cross validation was a newer practice then using a validation set, but I was recommended away from doing so as k-folds is supposed to be far more complicated. What would you recommend in this instance, and are there any other good practices to adopt?
A train:test split with cross-validation on the training set is part of the standard workflow in many machine learning modules. For an example and further details, I recommend the excellent sklearn article on it.
The implementation may be a little trickier but should not be prohibitive given the many code examples assuming you are using TF or Pytorch (see e.g. this SO question).
Compared to a single validation set, k-fold cross-validation avoids over-fitting hyperparameters to a fixed validation set and makes better use of the available data by utilizing the entire training set across the folds, albeit at greater computational cost. Whether or not this makes a big difference depends on your task. 2000 images does not sound like a lot in computer vision terms, so making good use of the data may be relevant to you, especially if you plan on tuning hyperparameters.