gpuentityspacynamed-entity-recognitionentity-linking

How can I train spaCy entity link model using GPU?


When I train spaCy entity linking model follow the document wiki_entity_linking, and I found that model was trained using cpu. It costs very long time to train epoch. (About 3 days for 2 epochs in the environment: 16x cpu, 64GB mem)

The command is: python wikidata_train_entity_linker.py -t 50000 -d 10000 -o xxx. So my question is that how could I do to use GPU for the train phase.


Solution

  • You will need to refactor the code to use spacy.require_gpu() before initialising your NLP models - for more information refer to the docs: https://spacy.io/api/top-level#spacy.require_gpu

    Before doing this I would make sure your task is running on all cores. If you are not running on all cores you could use joblib for multiprocessing minibatch partitions of your job:

        partitions = minibatch(texts, size=batch_size)
        executor = Parallel(n_jobs=n_jobs, backend="multiprocessing", prefer="processes")
        do = delayed(partial(transform_texts, nlp))
        tasks = (do(i, batch, output_dir) for i, batch in enumerate(partitions))
        executor(tasks)
    

    For more information here's a joblib multiprocessing NER training example from the docs: https://spacy.io/usage/examples#multi-processing