langchain

loading text from a db table into langchain


I am just taking my first steps with langchain. I have text in a SQLite table that I want to load and chunk. While langchain docs say I can load a blob, I can't wrap my head around how to pass to the text splitter the text I select from the table. Here is my (wrong) code using better-sqlite3 and langchain

import { RecursiveCharacterTextSplitter } from "@langchain/textsplitters";
const splitter = new RecursiveCharacterTextSplitter({
    chunkSize: 1000, 
    chunkOverlap: 200
});

// given CREATE TABLE t (id INTEGER PRIMARY KEY, fulltext TEXT);
const rows = db.prepare('SELECT id, fulltext FROM t').all();

for (const row of rows) {
   const docs = [ { metadata: row.id, pageContent: row.fulltext } ];
   const chunks = await splitter.splitDocuments(docs);
}

// Error
file:///Users/punkish/Projects/zai/node_modules/@langchain/textsplitters/dist/text_splitter.js:102
                const loc = _metadatas[i].loc && typeof _metadatas[i].loc === "object"
                                          ^

TypeError: Cannot read properties of undefined (reading 'loc')

Solution

  • Found the answer. I had to use splitter.splitText(fulltext) instead of splitter.splitDocuments().