image-processingcomputer-visionturi-create

Training on a big set of high-res pictures with turicreate out of memory


I'm trying to use Turicreate to train a model on some 150 pictures that are pretty high-res (4Mb each, 3000X5000). I'm running

model = tc.object_detector.create(train_data, max_iterations=10)

and after a while I'm getting and 'low virtual memory' warning and right after my computer gets stuck.

I was wondering what's the best practice here to be able to train on such a batch of pictures.

Full code that I'm using:

import turicreate as tc
data =  tc.SFrame('annotations.sframe')
train_data, test_data = data.random_split(0.8)
model = tc.object_detector.create(train_data, max_iterations=10)
predictions = model.predict(test_data)
metrics = model.evaluate(test_data)
model.save('mymodel.model')
model.export_coreml('MyCustomObjectDetector.mlmodel')

Solution

  • Normally you'd want to reduce the batch size, i.e. how large of a portion of the training data is used for one iteration. Apparently that's not easy to tweak yet in Turicreate, so it looks like the program is using the full dataset for one epoch. Ideally you'd want to use a smaller portion, for example 32 or 64 images. There's some discussion about the topic on Github, and apparently batch size as a public parameter might be coming in some future release.

    3000 x 5000 is also fairly large for this kind of work. You will probably want to downsize the images, i.e. by using bicubic interpolation implemented in for example Scipy. Depending on the kind of images you are working on even a factor of 10 along each dimension might not be too much shrinkage.