I'm learning about python and machine learning and reproduced some published code in a Kaggle notebook and modified it for my data within Azure Data Studio running Python 3. (Removed externally located code as per request in comments).
The code works, in that it splits the data into a training set (80%) and a testing/validation set (20%), then runs the Tensorflow model, which completes successfully, and finally a plot labelled "Training and validation precision" appears, with correct looking data, and then another plot labelled "Training and validation loss" appears and also looks correct.
I would now like to take the actual underlying predictions made by the model, but I cannot seem to find where they are located.
Running this code:
print(y_test)
only shows the historical predictors
as does the following - they both show identical information.
print(y_won)
As the precision is not 100%, neither one can be the actual calculated predicted value.
I also tried
print((y_true)
and
print((y_pred)
But both return an error "'y_true' is not defined" or "'y_pred' is not defined"
I'm sure its probably a simple issue and is largely due to my lack of knowledge of the underlying data structures. Any help greatly appreciated.
The full unmodified code is in the link provided above.
This is the section of the above linked code that deals with the model specifically.
# split data into train and test sets
X_train, X_test, y_train, y_test = model_selection.train_test_split(X, y_won, train_size=0.8, test_size=0.2, random_state=1)
model = tf.keras.Sequential([
tf.keras.layers.Dense(112, activation='relu', input_shape=(112,)),
tf.keras.layers.Dense(16, activation='softmax')
])
model.compile(optimizer=tf.keras.optimizers.Adam(5e-04),
loss=tf.keras.losses.CategoricalCrossentropy(),
metrics=[tf.keras.metrics.Precision(name='precision')])
dataset = tf.data.Dataset.from_tensor_slices((X_train.values, y_train.values))
train_dataset = dataset.shuffle(len(X_train)).batch(500)
dataset = tf.data.Dataset.from_tensor_slices((X_test.values, y_test.values))
validation_dataset = dataset.shuffle(len(X_test)).batch(500)
print("Start training..\n")
history = model.fit(train_dataset, epochs=200, validation_data=validation_dataset)
print("Done.")
precision = history.history['precision']
val_precision = history.history['val_precision']
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs = range(1, len(precision) + 1)
plt.plot(epochs, precision, 'b', label='Training precision')
plt.plot(epochs, val_precision, 'r', label='Validation precision')
plt.title('Training and validation precision')
plt.legend()
plt.figure()
plt.plot(epochs, loss, 'b', label='Training loss')
plt.plot(epochs, val_loss, 'r', label='Validation loss')
plt.title('Training and validation loss')
plt.legend()
plt.show()
from what I understood, you want to obtain the actual predictions made by the TensorFlow Keras model, you can use the model.predict() function like this:
predictions = model.predict(X_test)
and To get the class with the highest probability (i.e., the actual predicted class), you can use the np.argmax() function like this:
predicted_classes = np.argmax(predictions, axis=1)
visit https://www.tensorflow.org/api_docs/python/tf/math/argmax
links are provided for you to check out and maybe dive deeper.