milvuslangchain4j

Getting langchain4j to store ingested documents in milvus


I have the following code, which reads all files in /tmp/t3. This folder just contains a single document which just contains my name and job. I then ask chat gpt to give me my name and that works fine. It returns my correct name so it does parse the file in /tmp/t3.

This code reads all the files in /tmp/t3 before it runs which is ineffective. The process should really be split in 2, where I first ingest all the files from /tmp/t3 into Milvus, and then I should be able to query them multiple times without having to inject them each time. So the question is: How do I do that? I thought I could just remove the call to ingest, after having done the first ingest but if I do that langchain4j don't use the content of the files in /tmp/t3

try (MilvusContainer milvus = new MilvusContainer("milvusdb/milvus:v2.4.5")) {
    milvus.start();
    EmbeddingStore<TextSegment> embeddingStore = MilvusEmbeddingStore.builder()
            .uri(milvus.getEndpoint())
            .collectionName("ingest5")
            .dimension(384) // 1536)
            .build();

    List<Document> documents = FileSystemDocumentLoader.loadDocuments("/tmp/t3");
    EmbeddingStoreIngestor.ingest(documents, embeddingStore);
    
    ChatLanguageModel chatModel=OpenAiChatModel.builder()
            .apiKey("demo")
            .modelName(GPT_4_O_MINI)
            .build();
    
    Assistant assistant=AiServices.builder(Assistant.class)
        .chatLanguageModel(chatModel)
        .chatMemory(MessageWindowChatMemory.withMaxMessages(10))
        .contentRetriever(EmbeddingStoreContentRetriever.from(embeddingStore))
        .build();
    
    String answer = assistant.chat("What is my name?");
    System.out.println("answer=" + answer);
}

Solution

  • The documentation is confusing. Especially for users who don't know docker.

    The MilvusContainer class creates a clone of your container each time it is run. And this container is then deleted again when the test is done. So no data is ever really persisted.

    This may be obvious for people used to use docker, but is not really documented. (I could not find any javadoc for MilvusContainer).

    So the solution is just to remove all usage of MilvusContainer and then just create the embeddedstore with

        embeddingStorePriv=MilvusEmbeddingStore.builder()
                .uri("http://localhost")
                .collectionName(milvusCollectionName)
                .dimension(dimSize)
                .build();