pythongpuspacynamed-entity-recognitioncustom-training

SpaCy GPU memory utilization for NER training


My training code:

spacy.require_gpu()
nlp = spacy.blank('en')

if 'ner' not in nlp.pipe_names:
    ner = nlp.add_pipe('ner')
else:
    ner = nlp.get_pipe('ner')

docs = load_data(ANNOTATED_DATA_FILENAME_BIN)
train_data, test_data = split_data(docs, DATA_SPLIT)

unique_labels = set(ent.label_ for doc in train_data for ent in doc.ents)
for label in unique_labels:
    ner.add_label(label)

optimizer = nlp.initialize()

for i in range(EPOCHS):
    print(f"Starting Epoch {i+1}...")
    losses = {}
    batches = minibatch(train_data, size=compounding(4., 4096, 1.001))
    for batch in batches:
        for doc in batch:
            example = Example.from_dict(doc, {'entities': [(ent.start_char, ent.end_char, ent.label_) for ent in doc.ents]})
            nlp.update([example], drop=0.5, losses=losses, sgd=optimizer)
    print(f"Losses at iteration {i}: {losses}")

This code almost completely does not utilize GPU memory. Utilization is about 11-13% during training, which is almost the same as idle.

nvidia-smi

I did allocation test with torch, and all 8Gigs are allocated, so server works fine. The problem is with SpaCy or my code.

Could you please help?


Solution

  • Quoting William James Mattingly, Ph.D, who graciously helped:

    This may due to spaCy's change in training for 3.0. Training is done for projects and via the command line. This is how we used to train models for 2.0 and while it works, I believe there are certain issues that arise. This may be one of those issues. The newer approach passes an argument in the CLI when you train the model. https://spacy.io/usage/training

    In the docs you can specify in the config how to train and on which device. Training spaCy's Statistical Models - spaCy spacy.io spaCy is a free open-source library featuring state-of-the-art speed and accuracy and a powerful Python API.

    [system] gpu_allocator = "pytorch"

    this is the important bit 👏 👍 😊

    Then when you run train in the CLI, you'd do something like this:

    python -m spacy train config.cfg --gpu-id 0

    Thank you!