I have created a simple neural network (Python, Theano) to estimate a persons age based on their spending history from a selection of different stores. Unfortunately, it is not particularly accurate.
The accuracy might be hurt by the fact that the network has no knowledge of ordinality. For the network there is no relationship between the age classifications. It is currently selecting the age with the highest probability from the softmax output layer.
I have considered changing the output classification to an average of the weighted probability for each age.
E.g Given age probabilities: (Age 10 : 20%, Age 20 : 20%, Age 30: 60%)
Rather than output: Age 30 (Highest probability)
Weighted Average: Age 24 (10*0.2+20*0.2+30*0.6 weighted average)
This solution feels sub optimal. Is there a better was to implement ordinal classification in neural networks, or is there a better machine learning method that can be implemented? (E.g logistic regression)
This problem came up in a previous Kaggle competition (this thread references the paper I mentioned in the comments).
The idea is that, say you had 5 age groups, where 0 < 1 < 2 < 3 < 4, instead of one-hot encoding them and using a softmax objective function, you can encode them into K-1 classes and use a sigmoid objective. So, as an example, your encodings would be
[0] -> [0, 0, 0, 0]
[1] -> [1, 0, 0, 0]
[2] -> [1, 1, 0, 0]
[3] -> [1, 1, 1, 0]
[4] -> [1, 1, 1, 1]
Then the net will learn the orderings.