OpenAI now allows us to fine-tune GPT-3.5 models. I have tested and fine-tuned the model with my own dataset but the problem is the fine-tuned model generates the answer randomly, not correct based on my custom dataset.
Is there any way to make the model only answer from my own fine-tuned dataset?
This is a completely wrong approach, as you've already figured out.
As stated in the official OpenAI documentation:
Some common use cases where fine-tuning can improve results:
- Setting the style, tone, format, or other qualitative aspects
- Improving reliability at producing a desired output
- Correcting failures to follow complex prompts
- Handling many edge cases in specific ways
- Performing a new skill or task that’s hard to articulate in a prompt
Fine-tuning is not about answering a specific question with a specific answer (i.e., fact) from the fine-tuning dataset.
You need to implement a vector similarity search, as stated in the official OpenAI documentation:
When should I use fine-tuning vs embeddings with retrieval?
Embeddings with retrieval is best suited for cases when you need to have a large database of documents with relevant context and information.
By default OpenAI’s models are trained to be helpful generalist assistants. Fine-tuning can be used to make a model which is narrowly focused, and exhibits specific ingrained behavior patterns. Retrieval strategies can be used to make new information available to a model by providing it with relevant context before generating its response. Retrieval strategies are not an alternative to fine-tuning and can in fact be complementary to it.
A term that you most likely stumbled upon to this point if you're into AI is RAG (i.e., Retrieval-Augmented Generation). Read Nvidia's RAG explanation to better understand what RAG is:
To understand the latest advance in generative AI, imagine a courtroom.
Judges hear and decide cases based on their general understanding of the law. Sometimes a case — like a malpractice suit or a labor dispute — requires special expertise, so judges send court clerks to a law library, looking for precedents and specific cases they can cite.
Like a good judge, large language models (LLMs) can respond to a wide variety of human queries. But to deliver authoritative answers that cite sources, the model needs an assistant to do some research.
The court clerk of AI is a process called retrieval-augmented generation, or RAG for short.
RAGs use vector similarity search under the hood. Take a look at the visual representation of a RAG process below:
Image source: An introduction to RAG and simple/ complex RAG by Chia Jeng Yang
Information is extracted from data sources (A), slashed into chunks (B), transformed into vectors (C), and inserted into a vector database (D). When a user asks a question, its question is transformed into a vector (1). This vector is then compared with vectors that are inside the vector database (2). The most similar vectors (3) are passed to an LLM (4), which then returns an answer to the user (5).
This is how RAGs work. They use a vector similarity search under the hood.
You have at least the following three options if you want your LLM to answer a specific question with a specific answer (i.e., fact):