I'm currently attempting to create a model that outputs a probability distribution for a discrete numeric dataset.
I know that categorical cross-entropy as a classification loss function, but if I bucket my numeric data and one-hot encode it could I then output probabilities for each of those buckets?
Is my thinking here correct? If not, what are some alternatives I could use to tackle this problem?
I have read about some binomial distribution predicting neural networks and believe I could go with that as well
I am assuming that you have pre-processed your target variable i.e. class number and assigned bin number to them.
As per the example you have provided in the comments if you have 10K classes and 100 bins then the mapping will be:
Following the above example your one-hot encoder will have num_classes
or num_bins
elements or 100.
Then you just have to add softmax
activation function in the output layer of your neural network that will return the probability distribution over the classes and you are good to go. To get the bin with the highest probability you just need to do np.argmax(y_pred)
in case of a single instance and np.argmax(y_pred, axis = 1)
in case of batch processing (I have used numpy
assuming the output will be a numpy.ndarray
but you can change this as you see fit).
NOTE: The neural network will only give the bin number. You will not get back the original class number from the bin number.
Additional Information: Check this link to see how Categorical Cross-Entropy is used Keras Losses. logits is the input to the softmax
activation function. In case you are using this or an implementation close to this then you can ignore the softmax
activation on the output layer and calculate the loss from the logits itself.