python tensorflow keras neural-network pruning

Why does my pruned model have a larger file size than my initial model?

I'm exploring pruning a neural network using this example. My pruning code, using a pre-trained model, looks like this:

prune_low_magnitude = tfmot.sparsity.keras.prune_low_magnitude

# Compute end step to finish pruning after 2 epochs.
batch_size = 64
epochs = 3
validation_split = 0.1 # 10% of training set will be used for validation set.

num_images = 114 * (1 - validation_split)
end_step = np.ceil(num_images / batch_size).astype(np.int32) * epochs

# Define model for pruning.
pruning_params = {'pruning_schedule': tfmot.sparsity.keras.PolynomialDecay(initial_sparsity = 0.50,
                                                               final_sparsity = 0.80,
                                                               begin_step = 0,
                                                               end_step = end_step)
}

pruned_model = prune_low_magnitude(model, **pruning_params)

# `prune_low_magnitude` requires a recompile.
pruned_model.compile(optimizer = 'adam', loss = keras.losses.SparseCategoricalCrossentropy(from_logits = True), metrics = ['accuracy'])

logdir = tempfile.mkdtemp()

callbacks = [
  tfmot.sparsity.keras.UpdatePruningStep(),
  tfmot.sparsity.keras.PruningSummaries(log_dir = logdir),
]

pruned_model.fit(train_dataset, batch_size=batch_size, epochs=epochs, validation_data=valid_dataset, callbacks=callbacks)

Then, using pruned_model.evaluate(train_dataset, verbose=0), I see that it has indeed dropped in accuracy a bit, as expected - I got these results last time I ran a test:

Baseline test accuracy: 0.9197102189064026
Pruned model test accuracy: 0.8976686000823975

I have been using model.save() to save the initial model and pruned pruned_model as both .h5 and .keras formats; these come to 60.7 to 60.9 MB respectively. However, the pruned network, when saved as .h5, comes to 85.4 MB, and the .keras version is even larger, at 110 MB. I can't find anything in keras documentation about needing to specify optimisation when saving a file - only the directory.

Solution

A pruned model might end up being larger in size due to serialization overhead (a pruned model includes additional metadata to reconstruct the model architecture), sparse matrix storage (some storage formats do not efficiently compress sparse matrices and might store them less efficiently than dense matrices), checkpoint information (including the pruning state).

To resolve this, you can use efficient storage formats, e.g. .tflite with quantization:

converter = tf.lite.TFLiteConverter.from_keras_model(final_model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
tflite_model = converter.convert()

with open('quantized_model.tflite', 'wb') as f:
    f.write(tflite_model)

And/or strip pruning-specific wrappers, e.g. using strip_pruning (see documentation).