pythoncachingartificial-intelligencechatbotlangchain

Cache updates in chatbot when adding new documents


I'm building a chatbot to answer legal-related questions. I'm facing an issue with caching questions and responses — the goal is to retrieve an answer when someone asks a similar question that's already been saved. However, when I add new documents to the chatbot, the previously cached questions don't include information from these new files, so the responses become outdated and don't get updated accordingly.

I've thought of two solutions:

  1. When a cached question is asked, the system checks whether the number of information files has changed. If so, it fetches data from the newly added files. If there is relevant content, it generates a new answer that combines the new information with the previous response, then updates the cache.

  2. When new files are added, a separate process is triggered to update the cached responses.

Both solutions raise concerns about the chatbot's performance. What approach would you recommend to keep the cache up-to-date without degrading performance?


Solution

  • Here is a very naive approach to such a scenario. Our assumptions are:

    1. Your vector store initially contains N documents and any queries to which relevant context exists is cached.

    2. When the vector store is updated (i.e count of documents change) you want the cached items to be updated.

    from langchain_core.caches import BaseCache, RETURN_VAL_TYPE
    from typing import Any, Dict, Optional
    
    
    class MetadataAwareCache(BaseCache):
        def __init__(self, doc_count: int):
            super().__init__()
            self._cache = {}
            self._doc_count = doc_count
    
        # Initially set a document count for reference
        def update_document_count(self, doc_count: int):
            if self._doc_count == doc_count:
                return
            self._doc_count = doc_count
            for args, _old_response in self._cache.items():
                prompt, llm_string = args[0], args[1]
                response, metadata = self.regenerate_cache(prompt, llm_string)
                self.update(prompt, llm_string, response, metadata)
    
        def regenerate_cache(self, prompt: str, llm_input: str):
            # Regenerate response with new information
            response = "New LLM Response"
            metadata = {}
            return response, metadata
    
        # Cache lookup
        def lookup(self, prompt: str, llm_string: str) -> Optional[RETURN_VAL_TYPE]:
            cache_entry = self._cache.get((prompt, llm_string))
            if cache_entry:
                return cache_entry["value"]
            return None
    
        # Update cache
        def update(
            self,
            prompt: str,
            llm_string: str,
            value: RETURN_VAL_TYPE,
            metadata: Dict[str, Any] = None,
        ) -> None:
            self._cache[(prompt, llm_string)] = {
                "value": value,
                "metadata": metadata or {},
            }
    

    To use the cache

    from langchain.globals import set_llm_cache
    
    cache = MetadataAwareCache()
    set_llm_cache(cache)
    

    Considerations:

    1. If you are worried about performance and potentially the cost of re-generating the cache you can consider the frequency at which new information is likely to be added.
      If it is far and between it could be better to expire cache often rather than pay the cost to re-create cache that might not be re-used.

    2. If the updates are more often, you could make it a scheduled process.

    3. Alternatively you can add more metadata for each document and re-create only those cache that fall under such category.