pythonspacycoreference-resolution

en_coref_lg model in spacy


Hi I am trying a simple coref resolution code in python as

import spacy
nlp = spacy.load('en_coref_md')
doc = nlp(u'Phone area code will be valid only when all the below conditions are met. It cannot be left blank. It should be numeric. It cannot be less than 200. Minimum number of digits should be 3. ')
print(doc._.coref_clusters)
print(doc._.coref_resolved)

It shows following error:

"OSError: [E050] Can't find model 'en_coref_lg'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data"

If I try to install en_coref_lg using python -m spacy download en_coref_lg then it shows

"✘ No compatible model found for 'en_coref_lg' (spaCy v2.3.2)."

What should I do ?


Solution

  • Install neuralcoref and spacy==2.1.0:

    pip uninstall spacy 
    pip uninstall neuralcoref
    pip install spacy==2.1.0 
    pip install neuralcoref --no-binary neuralcoref
    

    Run your code:

    import spacy
    import neuralcoref
    nlp = spacy.load('en_core_web_md')
    neuralcoref.add_to_pipe(nlp)
    doc = nlp(u'Phone area code will be valid only when all the below conditions are met. It cannot be left blank. It should be numeric. It cannot be less than 200. Minimum number of digits should be 3.')
    print(doc._.has_coref)
    print(doc._.coref_clusters)
    True
    [Phone area code: [Phone area code, It, It, It]]
    

    Note the version of spacy==2.1.0. It's required if you want to install with pip.

    Alternatively, build from source:

    git clone https://github.com/huggingface/neuralcoref.git
    cd neuralcoref
    pip install -r requirements.txt # check for the desired spacy version
    python setup.py install
    

    Proof:

    import spacy
    import neuralcoref
    nlp = spacy.load('en_core_web_md')
    neuralcoref.add_to_pipe(nlp)
    print(spacy.__version__)
    doc = nlp(u'Phone area code will be valid only when all the below conditions are met. It cannot be left blank. It should be numeric. It cannot be less than 200. Minimum number of digits should be 3.')
    print(doc._.has_coref)
    print(doc._.coref_clusters)
    2.3.2
    True
    [Phone area code: [Phone area code, It, It, It]]