pythonpostgresqlvectorizationlangchain

Type "vector" does not exist on postgresql - langchain


I was trying to embed some documents on postgresql with the help of pgvector extension and langchain. Unfortunately I'm having trouble with the following error:

(psycopg2.errors.UndefinedObject) type "vector" does not exist
LINE 4:  embedding VECTOR(1536), 
                   ^

[SQL: 
CREATE TABLE langchain_pg_embedding (
    collection_id UUID, 
    embedding VECTOR(1536), 
    document VARCHAR, 
    cmetadata JSON, 
    custom_id VARCHAR, 
    uuid UUID NOT NULL, 
    PRIMARY KEY (uuid), 
    FOREIGN KEY(collection_id) REFERENCES langchain_pg_collection (uuid) ON DELETE CASCADE
)
]

My environment info:


List of installed extensions on postgres

  Name   | Version |   Schema   |                Description                 
---------+---------+------------+--------------------------------------------
 plpgsql | 1.0     | pg_catalog | PL/pgSQL procedural language
 vector  | 0.4.1   | public     | vector data type and ivfflat access method

I've tried the following ways to resolve:

  1. Fresh installing the Postgres docker image with pgvector extension enabled.
  2. Manually install the extension with the official instruction.
  3. Manually install the extension on Postgres like the following:
CREATE EXTENSION IF NOT EXISTS vector
    SCHEMA public
    VERSION "0.4.1";

But no luck.


Solution

  • Update 17th July 2023

    As previously I mentioned my issue was somewhere else in my configuration, here is the other reason that may be responsible for the error,

    1. The pgvector extension isn't enabled in the database you are using. Make sure you run CREATE EXTENSION vector; in each database you are using for storing vectors.
    2. The vector schema is not in the search_path. Run SHOW search_path; to see the available schemas in the search path and \dx to see the list of installed extensions with schemas.

    Unfortunately, the issue was somewhere else. My extension installation and search_path schema were totally okay for the defined database I was supposed to use. But my environment variable which was responsible for which database to use, got messed up and was using the default database postgres instead of my defined database, which didn't have the extension enabled.