I have observed in many articles and books that model selection is done before model tuning.
Model selection is generally done using some form of cross validation like k-fold where multiple models' metrics are calculated and the best one is selected.
And then the model selected is tuned to get the best hyperparameters.
But my question is that a model that was not selected might perform better with the right hyperparameters.
So why aren't all the models we are interested in tuned to get the right hyperparameters and then the best model be selected by cross validation.
It depends on the experimental set-up followed in each article/book, but in short the correct way of performing model selection + hyperparameter optimisation in the same experiment is to use Nested Cross-validation:
You can have a look at this other question to learn more about this validation scheme.
Note, however, that in some cases it can be acceptable to just do a general comparison with all the models, and then optimise only the top performing ones. But, in a rigorous study, this is far from ideal.