I have a model based on TableNet and VGG19, the data (Marmoot) for training and the saving path is mapped to a datalake storage (using Azure).
I'm trying to save it in the following ways and get the following errors on Databricks:
First approach:
import pickle
pickle.dump(model, open(filepath, 'wb'))
This saves the model and gives the following output:
WARNING:absl:Found untraced functions such as _jit_compiled_convolution_op, _jit_compiled_convolution_op, _jit_compiled_convolution_op, _jit_compiled_convolution_op, _jit_compiled_convolution_op while saving (showing 5 of 31). These functions will not be directly callable after loading.
Now when I try to reload the mode using:
loaded_model = pickle.load(open(filepath, 'rb'))
I get the following error (Databricks show in addition to the following error the entire stderr and stdout but this is the gist):
ValueError: Unable to restore custom object of type _tf_keras_metric. Please make sure that any
custom layers are included in the `custom_objects` arg when calling `load_model()` and make
sure that all layers implement `get_config` and `from_config`.
Second approach:
model.save(filepath)
and for the I get the following error:
Fatal error: The Python kernel is unresponsive.
The Python process exited with exit code 139 (SIGSEGV: Segmentation fault).
The last 10 KB of the process's stderr and stdout can be found below. See driver logs for full logs.
---------------------------------------------------------------------------
Last messages on stderr:
Mon Jan 9 08:04:31 2023 Connection to spark from PID 1285
Mon Jan 9 08:04:31 2023 Initialized gateway on port 36597
Mon Jan 9 08:04:31 2023 Connected to spark.
2023-01-09 08:05:53.221618: I tensorflow/core/platform/cpu_feature_guard.cc:193] This
TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the
following CPU
instructions in performance-critical operations: AVX2 FMA
and much more, its hard to find the proper place of error form all of the stderr and stdout. It shows the entire stderr and stdout which makes it very hard to find the solution (it shows all the stderr and stdout including the training and everything)
Third approach (partially):
I also tried:
model.save_weights(weights_path)
but once again I was unable to reload them (this approach was tried the least)
Also I tried saving the checkpoints by adding this:
model_checkpoint = tf.keras.callbacks.ModelCheckpoint(filepath, monitor = "val_table_mask_loss", verbose = 1, save_weights_only=True)
as a callback in the fit
method (callbacks=[model_checkpoint]
)
but in the end of the first epoch it will generate the following error(I show the end of the Traceback):
h5py/_objects.pyx in h5py._objects.with_phil.wrapper()
h5py/_objects.pyx in h5py._objects.with_phil.wrapper()
h5py/h5f.pyx in h5py.h5f.create()
OSError: Unable to create file (file signature not found)
When I use the second approach on a platform that is not Databricks it works fine, but then when I try to load the model I get an error similar to the first approach loading.
my variable filepath
that I try to save to is a dbfs
reference, and my dbfs
is mapped to the datalake storage
When trying as suggested in the comments, with the following answer I get the following error:
----> 3 model2 = keras.models.load_model("/tmp/model-full2.h5")
...
ValueError: Unknown layer: table_mask. Please ensure this object is passed to the `custom_objects` argument. See https://www.tensorflow.org/guide/keras/save_and_serialize#registering_the_custom_object for details.
So I try following the error plus this answer:
model2 = keras.models.load_model("/tmp/model-full2.h5", custom_objects={'table_mask': table_mask})
but then I get the following error:
TypeError: 'KerasTensor' object is not callable
Try making the following changes to your custom object(s), so they can be properly serialized and deserialized:
Add the keywords arguments to your constructor:
def __init__(self, **kwargs):
super(TableMask, self).__init__(**kwargs)
Rename table_mask
to TableMask
to avoid naming conflicts. So when you load your model, it will look something like this:
model = keras.models.load_model("/tmp/path", custom_objects={'TableMask': TableMask, 'CustomObj2': CustomObj2, 'CustomMetric': CustomMetric})
We found few error in my code:
__init__
function as the answer suggestAlso I used the following answer that @AloneTogether suggested in the comments (this answer is the way I choose to save and load the model, plus the extra data we wrote in the above list)
After all that, the saving, loading, predicting worked great