openai-apilangchainchatgpt-apilanguage-modelllama-index

OpenAI Fine-tuning API: Why would I use LlamaIndex or LangChain instead of fine-tuning a model?


I'm just getting started with working with LLMs, particularly OpenAIs and other OSS models. There are a lot of guides on using LlamaIndex to create a store of all your documents and then query on them. I tried it out with a few sample documents, but discovered that each query gets super expensive quickly. I think I used a 50-page PDF document, and a summarization query cost me around 1.5USD per query. I see there's a lot of tokens being sent across, so I'm assuming it's sending the entire document for every query. Given that someone might want to use thousands of millions of records, I can't see how something like LlamaIndex can really be that useful in a cost-effective manner.

On the other hand, I see OpenAI allows you to train a ChatGPT model. Wouldn't that, or using other custom trained LLMs, be much cheaper and more effective to query over your own data? Why would I ever want to set up LlamaIndex?


Solution

  • TL;DR: Use LlamaIndex or LangChain to get an exact answer (i.e., a fact) to a specific question from existing data sources.

    Why choose LlamaIndex or LangChain over fine-tuning a model?

    The answer is simple, but you couldn't answer it yourself because you were only looking at the costs. There are other aspects as well, not just costs. Take a look at the usability side of the question.

    Fine-tuning a model will give the model additional general knowledge, but the fine-tuned model will not give you an exact answer (i.e., a fact) to a specific question.

    People train an OpenAI model with some data, but when they ask it something related to the fine-tuning data, they are surprised that the model doesn't answer with the knowledge gained by fine-tuning. See an example explanation on the official OpenAI forum by @juan_olano:

    I fine-tuned a 70K-word book. My initial expectation was to have the desired QA, and at that point I didn’t know any better. But this fine-tuning showed me the limits of this approach. It just learned the style and stayed more or less within the corpus, but hallucinated a lot.

    Then I split the book into sentences and worked my way through embeddings, and now I have a very decent QA system for the book, but for narrow questions. It is not as good for questions that need the context of the entire book.

    Also, see the official OpenAI documentation:

    Some common use cases where fine-tuning can improve results:

    • Setting the style, tone, format, or other qualitative aspects
    • Improving reliability at producing a desired output
    • Correcting failures to follow complex prompts
    • Handling many edge cases in specific ways
    • Performing a new skill or task that’s hard to articulate in a prompt

    LlamaIndex or LangChain enable you to connect OpenAI models with your existing data sources. For example, a company has a bunch of internal documents with various instructions, guidelines, rules, etc. LlamaIndex or LangChain can be used to query all those documents and give an exact answer to an employee who needs an answer.

    OpenAI models (GPT-3, GPT-3.5, GPT-4, etc.) can't query their knowledge. Querying requires calculating embedding vectors from a resource and then calculating cosine similarity, which OpenAI models can't do. An OpenAI model simply gives an answer based on the statistical probability of which word should follow the previous one.

    I strongly suggest you read my previous answer regarding semantic search. You'll understand this answer better.