pipspacy

ERROR: Could not install packages due to an EnvironmentError: [Errno 122] Disk quota exceeded


using scispacy and spacy and scispacy on slurm, NOT COLLAB. smaller model (en_core_sci_sm) works fine, large model throws an error. python version - 3.9.2

pip list inscludes -

en-core-sci-sm     0.5.1
scipy              1.11.2
scispacy           0.5.2

code as in scispacy example

import spacy
if __name__ == '__main__':
  nlp = spacy.load("en_core_sci_lg")
  doc = nlp("Alterations in the hypocretin receptor 2 and preprohypocretin genes produce 
  narcolepsy in some animals.")

Error:

File "/dir/../file.py", line 58, in main
    nlp = spacy.load("en_core_sci_lg")
  File "/dir/../venv/lib/python3.9/site-packages/spacy/__init__.py", line 54, in load
    return util.load_model(
  File "/dir/../venv/lib/python3.9/site-packages/spacy/util.py", line 439, in load_model
    raise IOError(Errors.E050.format(name=name))
OSError: [E050] Can't find model 'en_core_sci_lg'. It doesn't seem to be a Python package or a valid path to a data directory.

Tried so far - update pip, update spacy, update scispacy. in the venv -

pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.1/en_core_sci_lg-0.5.1.tar.gz

Error -

 Collecting https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.1/en_core_sci_lg-0.5.1.tar.gz
  Downloading https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.1/en_core_sci_lg-0.5.1.tar.gz (532.3 MB)
     |████████████████████▋           | 343.7 MB 2.4 MB/s eta 0:01:18sda3: write failed, user block limit reached.
ERROR: Could not install packages due to an EnvironmentError: [Errno 122] Disk quota exceeded
  1. like suggested here - pip install --cache-dir=/..file_dir/.cache/ https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.1/en_core_sci_lg-0.5.1.tar.gz same error (Disk quota exceeded) and result.

When running df -h I have more than free 2T so not sure why I don't have the space.


Solution

  • In case this would be useful to someone in the future -

    I ended up doing 3 things that helped me -

    1. in the .sh file that run this code added a path to hf_cach file
    HF_HOME=$PWD/.hf_cach
    export HF_HOME
    
    1. Reboot the computer

    2. Inside my code downloaded the file again while running

    import spacy.cli 
    spacy.cli.download("en_core_web_lg") nlp =
    spacy.load("en_core_web_lg")