LlamaIndex Python: Metadata filter with `None` value does not retrieve documents

I’m working with LlamaIndex in Python and ran into an issue with metadata filtering.

I have a TextNode that includes a metadata field explicitly set to None. When I try to retrieve it using a metadata filter where value is None, no documents are returned. I expected that documents with None metadata would match such a filter.

Here's an MRE:

from llama_index.core import VectorStoreIndex
from llama_index.core.schema import TextNode
from llama_index.core.vector_stores import (
    MetadataFilter,
    MetadataFilters,
    FilterOperator,
)

node_01 = TextNode(
    text="This document has None in the metadata",
    id_="node_01",
    metadata={"start_date": None},
)

doc_index = VectorStoreIndex([node_01])

# Debug: Check what's actually stored
print("Index nodes:\n", [node.metadata for node in doc_index.docstore.docs.values()])

filter_null_start_date = MetadataFilter(key="start_date", operator=FilterOperator.EQ, value=None)
filters = MetadataFilters(filters=[filter_null_start_date])
retriever = doc_index.as_retriever(filters=filters, similarity_top_k=1)
nodes = retriever.retrieve("this")

print("Retrieved nodes:\n", [(node.node_id, node.metadata) for node in nodes])

Output:

Index nodes:
 [{'start_date': None}]
Retrieved nodes:
 []

So even though the metadata is stored as {'start_date': None}, filtering with EQ value=None does not return the node.

My questions:

Is this the expected behavior in LlamaIndex (i.e., None metadata is not filterable)?
If so, what is the recommended way to index “null” metadata values so they can be retrieved via filters?

Any clarification or workaround would be appreciated.

Solution

correct, None is not filterable in LlamaIndex, that's the expected behavior. You can try the following:

Instead of None , you can change it to a str i.e., "None"

node_01 = TextNode(
    text="This document has None in the metadata",
    id_="node_01",
    metadata={"start_date": "None"},
)

filter_null_start_date = MetadataFilter(key="start_date", operator=FilterOperator.EQ, value=str(None))

otherwise, simply leave it as an empty string ""

node_01 = TextNode(
    text="This document has None in the metadata",
    id_="node_01",
    metadata={"start_date": ""},
)

filter_null_start_date = MetadataFilter(key="start_date", operator=FilterOperator.EQ, value="")

example:

from llama_index.core.schema import TextNode
from llama_index.core.vector_stores import (
    MetadataFilter,
    MetadataFilters,
    FilterOperator,
)
from llama_index.core import VectorStoreIndex, Settings
from llama_index.embeddings.ollama import OllamaEmbedding
from llama_index.llms.ollama import Ollama

embed_model = OllamaEmbedding(
        model_name="llama3.2",
        base_url="http://localhost:11434"
    )

# 2) Tell LlamaIndex to use this embedder globally
Settings.embed_model = embed_model

# using metadata={"start_date": "None"}
node_01 = TextNode(
    text="This document has None in the metadata",
    id_="node_01",
    metadata={"start_date": "None"},
)
node_02 = TextNode(
    text="This document has start date in the metadata",
    id_="node_02",
    metadata={"start_date": "20/03/2023"},
)

doc_index = VectorStoreIndex([node_01, node_02])

# Debug: Check what's actually stored
print("Index nodes:\n", [node.metadata for node in doc_index.docstore.docs.values()])

filter_null_start_date = MetadataFilter(key="start_date", operator=FilterOperator.EQ, value=str(None))
filters = MetadataFilters(filters=[filter_null_start_date])
retriever = doc_index.as_retriever(filters=filters, similarity_top_k=1)
nodes = retriever.retrieve("this")

print("Retrieved nodes:\n", [(node.node_id, node.metadata) for node in nodes])

output:

Index nodes:
 [{'start_date': 'None'}, {'start_date': '20/03/2023'}]
Retrieved nodes:
 [('node_01', {'start_date': 'None'})]

# using metadata={"start_date": ""}
node_01 = TextNode(
    text="This document has None in the metadata",
    id_="node_01",
    metadata={"start_date": ""},
)
node_02 = TextNode(
    text="This document has start date in the metadata",
    id_="node_02",
    metadata={"start_date": "20/03/2023"},
)

doc_index = VectorStoreIndex([node_01, node_02])

# Debug: Check what's actually stored
print("Index nodes:\n", [node.metadata for node in doc_index.docstore.docs.values()])

filter_null_start_date = MetadataFilter(key="start_date", operator=FilterOperator.EQ, value="")
filters = MetadataFilters(filters=[filter_null_start_date])
retriever = doc_index.as_retriever(filters=filters, similarity_top_k=1)
nodes = retriever.retrieve("this")

print("Retrieved nodes:\n", [(node.node_id, node.metadata) for node in nodes])

output:

Index nodes:
 [{'start_date': ''}, {'start_date': '20/03/2023'}]
Retrieved nodes:
 [('node_01', {'start_date': ''})]