spacyspacy-3spacy-transformers

Is it possible to add custom entity labels to Spacy 3.0 config file?


I'm working on a custom NER model with spacy-transformers and roBERTa. I'm really only using the CLI for this and am trying to alter my Spacy config.cfg file to account for custom entity labels in the pipeline.

I'm new to Spacy, but I've gathered that people usually use ner.add_label to accomplish this. I wonder if I might be able to change something in [initialize.components.ner.labels] of the config, but haven't come across a good way to do that.

I can't seem to find any options to alter the config file in a similar fashion - does anyone know if this is possible, or what might be the most succinct way to achieve those custom labels?

Edited for clarity: My issue could be different than my config theory. Right now I am getting an output, but instead of text labels they are numeric labels, such as:

('Oct',383) ('2019',383) ('February',383)

Thank you in advance for your help!


Solution

  • If you are working with the config-based training, generally you should not have to specify the labels anywhere - spaCy will look at the training data and get the list of labels from there.

    There are a few cases where this won't work.

    You have labels that aren't in your training data. These can't be learned so I would just consider this an error, but sometimes you have to work with the data you've been given.

    You training data is very large. In this case reading over all the training data to get a complete list of labels can be an issue. You can use the init labels command to generate data so that the input data doesn't have to be scanned every time you start training.