I'm creating an application that uses GPT-4 (via the OpenAI API) for visual question answering. The problem is that tests using this module, which previously passed, have started to fail constantly due to a decline in the quality of the answers. Is this expected behavior for GPT-4 (or the OpenAI API)?
If you use the gpt-4o
as a model identifier in your application, then yes, the underlying model might change.
Let's take gpt-4o
as an example:
gpt-4o-2024-05-13
gpt-4o-2024-08-06
It's a good question whether the quality of the answer can decrease in newer models. Companies use various metrics to ensure it doesn't happen. And the blind tests rating of AI models generally show that people rank newer models higher.
At the same time, it's easy to find anecdotal opinions that some previous models were better. At first, my impression too was that gpt-4
was better than the newer gpt-4o
, but when I did a blind test, I preferred answers from gpt-4o
.
If you prefer to stick to tested legacy model version, you can still use more specific id like gpt-4o-2024-05-13
.