I am coding my own models for a time but I saw huggingface and started using it. I wanted to know whether I should use the pretrained model or train model (the same hugging face model) with my own dataset. I am trying to make a question answering model.
I have dataset of 10k-20k questions.
The state-of-the-art approach is to take a pre-trained model that was pre-trained on tasks that are relevant to your problem and fine-tune the model on your dataset.
So assuming you have your dataset in English, you should take a pre-trained model on natural language English. You can then fine-tune it.
This will most likely work better than training from scratch, but you can experiment on your own. You can also load a model without the pre-trained weights in Huggingface.