machine-learningtestingmetricstraining-data

higher coefficient of determination values in the testing phase compared to the training phase


I developed seven different hybrid ML models using metaheuristic algorithms and ANN. Interestingly, the coefficient of determination values for most of these models are higher in the testing phase compared to the training phase. This discrepancy raises the question: what could be the reason behind this phenomenon? please put reference for your words, if possible.


Solution

  • Many things can cause this.

    1- Small dataset: If your dataset is small, the number of testing samples will be low, and division can be done in a way that the model performs very good on those limited number of test data.

    2- Your testing data is similar to training data.

    3- Duplicate samples in the testing data.

    Draw the cross-plots for your training and testing subset and asses their behavior. You can figure out the reason by analyzing those figures.