
How can I train spaCy entity link model using GPU?

When I train spaCy entity linking model follow the document wiki_entity_linking, and I found that model was trained using cpu. It costs very long time to train epoch. (About 3 days for 2 epochs in the environment: 16x cpu, 64GB mem)

The command is: python -t 50000 -d 10000 -o xxx. So my question is that how could I do to use GPU for the train phase.


  • You will need to refactor the code to use spacy.require_gpu() before initialising your NLP models - for more information refer to the docs:

    Before doing this I would make sure your task is running on all cores. If you are not running on all cores you could use joblib for multiprocessing minibatch partitions of your job:

        partitions = minibatch(texts, size=batch_size)
        executor = Parallel(n_jobs=n_jobs, backend="multiprocessing", prefer="processes")
        do = delayed(partial(transform_texts, nlp))
        tasks = (do(i, batch, output_dir) for i, batch in enumerate(partitions))

    For more information here's a joblib multiprocessing NER training example from the docs: