spacyhuggingface-transformersbert-language-modelspacy-transformers

How to use existing huggingface-transformers model into spacy?


I'm here to ask you guys if it is possible to use an existing trained huggingface-transformers model with spacy.

My first naive attempt was to load it via spacy.load('bert-base-uncased'), it didn't work because spacy demands a certain structure, which is understandable.

Now I'm trying to figure out how to use the spacy-transformers library to load the model, create the spacy structure, and use it from that point as a normal spacy-aware model.

I don't know if it is even possible as I couldn't find anything regarding the subject. I've tried to read the documentation but all guides, examples, and posts I found, start from a spacy structured model like spacy/en_core_web_sm, but how did that model was created in the first place? I can believe someone has to train everything again with spacy.

Can I get some help from you?

Thanks.


Solution

  • What you do is add a Transformer component to your pipeline and give the name of your HuggingFace model as a parameter to that. This is covered in the docs, though people do have trouble finding it. It's important to understand that a Transformer is only one piece of a spaCy pipeline, and you should understand how it all fits together.

    To pull from the docs, this is how you specify a custom model in a config:

    [components.transformer.model]
    @architectures = "spacy-transformers.TransformerModel.v3"
    # XXX You can change the model name here
    name = "bert-base-cased"
    tokenizer_config = {"use_fast": true}
    

    Going back to why you need to understand spaCy's structure, it's very important to understand that in spaCy, Transformers are only sources of features. If your HuggingFace model has an NER head or something it will not work. So if you use a custom model, you'll need to train other components, like NER, on top of it.

    Also note that spaCy has a variety of non-Transformers built-in models. These are very fast to train and in many situations will give performance comparable to Transformers; even if they aren't as accurate, you can use the built-in models to get your pipeline configured and then just swap in a Transformer.

    all guides, examples, and posts I found, start from a spacy structured model like spacy/en_core_web_sm, but how did that model was created in the first place?

    Did you see the quickstart? The pretrained models are created using configs similar to what you get from that.