After running the code
train(train_data, EPOCHS)
I'm getting error
"AttributeError: module
'tensorflow.python.distribute.input_lib' has no attribute 'DistributedDatasetInterface'"
and after traching the error, it points at line 15--> of
def train(data, EPOCHS):
# Loop through epochs
for epoch in range(1, EPOCHS+1):
print('\n Epoch {}/{}'.format(epoch, EPOCHS))
progbar = tf.keras.utils.Progbar(len(data))
# Creating a metric object
r = Recall()
p = Precision()
# Loop through each batch
for idx, batch in enumerate(data):
# Run train step here
loss = train_step(batch)
yhat = siamese_model.predict(batch[:2]) #<--- line 15
r.update_state(batch[2], yhat)
p.update_state(batch[2], yhat)
progbar.update(idx+1)
print(loss.numpy(), r.result().numpy(), p.result().numpy())
# Save checkpoints
if epoch % 10 == 0:
checkpoint.save(file_prefix=checkpoint_prefix)
Your error happens cuz there's mismatch between how you're trying to make prediction and what your dataset type allows. Tensorflow has changed some of its APIs across versions, please refer this this will help you to gain some knowledge about how actually perform distributed training Distributed training with TensorFlow
Try this maybe it can solve your problem
Try a direct model call
The simplest fix is to just call your model directly instead of using the predict method
# Change this line
yhat = siamese_model.predict(batch[:2])
# To this
yhat = siamese_model(batch[:2], training=False)
This works because it bypasses the predict() machinery that's causing the error.
Also refer this github issue as well issue 61900