[SOLVED] OpenAI Chat Completions API: How do I make a fine-tuned GPT-3.5 model only answer from the fine-tuned data?

OpenAI Chat Completions API: How do I make a fine-tuned GPT-3.5 model only answer from the fine-tuned data?

OpenAI now allows us to fine-tune GPT-3.5 models. I have tested and fine-tuned the model with my own dataset but the problem is the fine-tuned model generates the answer randomly, not correct based on my custom dataset.

Is there any way to make the model only answer from my own fine-tuned dataset?

Solution

This is a completely wrong approach, as you've already figured out.

Why doesn't the fine-tuned OpenAI model answer a specific question with a specific answer (i.e., fact) from the fine-tuning dataset?

As stated in the official OpenAI documentation:

Some common use cases where fine-tuning can improve results:

Setting the style, tone, format, or other qualitative aspects

Improving reliability at producing a desired output

Correcting failures to follow complex prompts

Handling many edge cases in specific ways

Performing a new skill or task that’s hard to articulate in a prompt

Fine-tuning is not about answering a specific question with a specific answer (i.e., fact) from the fine-tuning dataset.

What's the correct approach, then?

You need to implement a vector similarity search, as stated in the official OpenAI documentation:

When should I use fine-tuning vs embeddings with retrieval?

Embeddings with retrieval is best suited for cases when you need to have a large database of documents with relevant context and information.

By default OpenAI’s models are trained to be helpful generalist assistants. Fine-tuning can be used to make a model which is narrowly focused, and exhibits specific ingrained behavior patterns. Retrieval strategies can be used to make new information available to a model by providing it with relevant context before generating its response. Retrieval strategies are not an alternative to fine-tuning and can in fact be complementary to it.

A term that you most likely stumbled upon to this point if you're into AI is RAG (i.e., Retrieval-Augmented Generation). Read Nvidia's RAG explanation to better understand what RAG is:

To understand the latest advance in generative AI, imagine a courtroom.

Judges hear and decide cases based on their general understanding of the law. Sometimes a case — like a malpractice suit or a labor dispute — requires special expertise, so judges send court clerks to a law library, looking for precedents and specific cases they can cite.

Like a good judge, large language models (LLMs) can respond to a wide variety of human queries. But to deliver authoritative answers that cite sources, the model needs an assistant to do some research.

The court clerk of AI is a process called retrieval-augmented generation, or RAG for short.

Wait, what does a RAG have to do with vector similarity search?

RAGs use vector similarity search under the hood. Take a look at the visual representation of a RAG process below:

Image source: An introduction to RAG and simple/ complex RAG by Chia Jeng Yang

Information is extracted from data sources (A), slashed into chunks (B), transformed into vectors (C), and inserted into a vector database (D). When a user asks a question, its question is transformed into a vector (1). This vector is then compared with vectors that are inside the vector database (2). The most similar vectors (3) are passed to an LLM (4), which then returns an answer to the user (5).

This is how RAGs work. They use a vector similarity search under the hood.

So, how do I achieve my goal?

You have at least the following three options if you want your LLM to answer a specific question with a specific answer (i.e., fact):

Custom solution (see my past StackOverflow answer).
Using LlamaIndex RAG or LangChain RAG (see my YouTube tutorial with corresponsing code).
Using the OpenAI Assistants API (see my YouTube tutorial with corresponding code).