When I train spaCy entity linking model follow the document wiki_entity_linking, and I found that model was trained using cpu. It costs very long time to train epoch. (About 3 days for 2 epochs in the environment: 16x cpu, 64GB mem)
The command is:
python wikidata_train_entity_linker.py -t 50000 -d 10000 -o xxx
. So my question is that how could I do to use GPU for the train phase.
You will need to refactor the code to use spacy.require_gpu() before initialising your NLP models - for more information refer to the docs: https://spacy.io/api/top-level#spacy.require_gpu
Before doing this I would make sure your task is running on all cores. If you are not running on all cores you could use joblib for multiprocessing minibatch partitions of your job:
partitions = minibatch(texts, size=batch_size)
executor = Parallel(n_jobs=n_jobs, backend="multiprocessing", prefer="processes")
do = delayed(partial(transform_texts, nlp))
tasks = (do(i, batch, output_dir) for i, batch in enumerate(partitions))
executor(tasks)
For more information here's a joblib multiprocessing NER training example from the docs: https://spacy.io/usage/examples#multi-processing