pythonchromadb

chromaDB collection.query WHERE


this is how i pass values to my where parameter:

results = collection.query(
    query_texts=user_info,
    n_results=10,
    where={"date":"20-04-2023"}

)
print(results['metadatas'], results['distances'])

what if i want my start day to be "20-04-2023" till today ? I would like to pass range to this parameter


Solution

  • Store the date metadata as unix timestamp and employ the $gte and $lte operators to filter documents.

    The reason for this conversion is because both operators operate only on numbers

    Note that I opted to introduce a separate "date_epoch" metadata field for the converted date. This makes it so that the year, month, day on the matching documents can still be read by a human.

    import chromadb
    from datetime import datetime
    
    chroma_client = chromadb.Client()
    collection = chroma_client.get_or_create_collection(name="my_collection")
    
    
    DATE_FORMAT = '%d-%m-%Y'
    def to_unix_epoch(date_str):
        dt = datetime.strptime(date_str, DATE_FORMAT)
        return int(dt.timestamp())
    
    
    collection.upsert(
        documents=[
            "This is a document about pineapple",
            "This is a document about oranges",
            "This is a document about mangoes"
        ],
        metadatas=[
            {"date": "20-04-2023", "date_epoch": to_unix_epoch("20-04-2023")},
            {"date": "19-04-2023", "date_epoch": to_unix_epoch("19-04-2023")},
            {"date": "21-04-2023", "date_epoch": to_unix_epoch("21-04-2023")},
    
        ],
        ids=["id1", "id2", "id3"]
    )
    
    results = collection.query(
        query_texts=["This is a document"],
        where={
            "$and": [
                {"date_epoch": {"$gte": to_unix_epoch("20-04-2023")}}, 
                {"date_epoch": {"$lte": to_unix_epoch("21-04-2023") }} 
            ]
        },
    )
    
    print(results)
    

    For today, your query will be:

    today = datetime.now().strftime(DATE_FORMAT)
    
    results = collection.query(
        # ...
        where={
            "$and": [
                {"date_epoch": {"$gte": to_unix_epoch("20-04-2023")}}, 
                {"date_epoch": {"$lte": to_unix_epoch(today) }} 
            ]
        }
        #...
    )