pythonpython-2.7ubuntunltkspyder

downloading error using nltk.download()


I am experimenting NLTK package using Python. I tried to downloaded NLTK using nltk.download(). I got this kind of error message. How to solve this problem? Thanks.

The system I used is Ubuntu installed under VMware. The IDE is Spyder.

enter image description here

After using nltk.download('all'), it can download some packages, but it gets error message when downloading oanc_masc

enter image description here


Solution

  • To download a particular dataset/models, use the nltk.download() function, e.g. if you are looking to download the punkt sentence tokenizer, use:

    $ python3
    >>> import nltk
    >>> nltk.download('punkt')
    

    If you're unsure of which data/model you need, you can start out with the basic list of data + models with:

    >>> import nltk
    >>> nltk.download('popular')
    

    It will download a list of "popular" resources.

    Ensure that you've the latest version of NLTK because it's always improving and constantly maintain:

    $ pip install --upgrade nltk
    

    EDITED

    In case anyone is avoiding errors from downloading larger datasets from nltk, from https://stackoverflow.com/a/38135306/610569

    $ rm /Users/<your_username>/nltk_data/corpora/panlex_lite.zip
    $ rm -r /Users/<your_username>/nltk_data/corpora/panlex_lite
    $ python
    
    >>> import nltk
    >>> dler = nltk.downloader.Downloader()
    >>> dler._update_index()
    >>> dler._status_cache['panlex_lite'] = 'installed' # Trick the index to treat panlex_lite as it's already installed.
    >>> dler.download('popular')
    

    And if anyone wants to find nltk_data directory, see https://stackoverflow.com/a/36383314/610569

    And to config nltk_data path, see https://stackoverflow.com/a/22987374/610569