I am following this example from the llama_index docs: https://github.com/run-llama/llama_index/blob/main/docs/examples/metadata_extraction/MetadataExtractionSEC.ipynb to use a custom metadata extractor with my code like so:
from llama_index.node_parser import SimpleNodeParser
from llama_index.node_parser.extractors import (
MetadataExtractor,
MetadataFeatureExtractor,
)
class CustomExtractor(MetadataFeatureExtractor):
def extract(self, nodes):
metadata_list = [
{
"custom": node.metadata["document_title"]
+ "\n"
+ node.metadata["excerpt_keywords"]
}
for node in nodes
]
return metadata_list
metadata_extractor = MetadataExtractor(
extractors=[
CustomExtractor()
],
)
But running the code fails with the following error:
Can't instantiate abstract class CustomExtractor with abstract method class_name
I don't believe there is a syntax error here.
MetadataFeatureExtractor
extends BaseExtractor
which extends BaseComponent
which defines an @abstractmethod
called class_name()
. You need to implement that in your custom extractor. Try
class CustomExtractor(MetadataFeatureExtractor):
@classmethod
def class_name(cls):
return 'CustomExtractor'
def extract(self, nodes):
metadata_list = [
{
"custom": node.metadata["document_title"]
+ "\n"
+ node.metadata["excerpt_keywords"]
}
for node in nodes
]
return metadata_list