pythonpython-3.xhuggingface-transformershuggingface-tokenizers

ImportError caused by file with the same name in working dir and file from imported package


I've bumped into an issue when trying to run a python script and for simplicity let's call it my_tokenizer.py and its content is just importing hugging face's transformers. Unfortunately, trying to run it from the working directory leads to ImportError and it seems it is caused due to the name of the file that is in the working directory and has the same name as the file that transformer package uses somewhere in its internals.

Having 2 files in the working directory:

and running python my_tokenizer.py leads to following ImportError:

Traceback (most recent call last):
  File "project/my_tokenizer.py", line 1, in <module>
    import transformers
  File "/Users/radoslawslowinski/opt/anaconda3/envs/aa_ee/lib/python3.8/site-packages/transformers/__init__.py", line 54, in <module>
    from .data import (
  File "/Users/radoslawslowinski/opt/anaconda3/envs/aa_ee/lib/python3.8/site-packages/transformers/data/__init__.py", line 6, in <module>
    from .processors import (
  File "/Users/radoslawslowinski/opt/anaconda3/envs/aa_ee/lib/python3.8/site-packages/transformers/data/processors/__init__.py", line 5, in <module>
    from .glue import glue_convert_examples_to_features, glue_output_modes, glue_processors, glue_tasks_num_labels
  File "/Users/radoslawslowinski/opt/anaconda3/envs/aa_ee/lib/python3.8/site-packages/transformers/data/processors/glue.py", line 24, in <module>
    from ...tokenization_utils import PreTrainedTokenizer
  File "/Users/radoslawslowinski/opt/anaconda3/envs/aa_ee/lib/python3.8/site-packages/transformers/tokenization_utils.py", line 26, in <module>
    from .tokenization_utils_base import (
  File "/Users/radoslawslowinski/opt/anaconda3/envs/aa_ee/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 31, in <module>
    from tokenizers import AddedToken
ImportError: cannot import name 'AddedToken' from 'tokenizers' (/Users/radoslawslowinski/project/tokenizers.py)

Although I could just rename my file from project/tokenizers.py to something else, I'd like to know why it occurs.


Solution

  • I think I've understood what causes the issue - it is shadowing file with the same name in package transformer (that internally import another package called tokenizers) with my local file called tokenizers.py.

    It is so because my working directory is first on list of paths that will be searched to find imports. It can be checked with:

    import sys
    print(sys.path)
    from transformers import BasicTokenizer
    

    And to proof that search for imports starts from directory in which you call script, you can move first sys.path to the end of the list and the following code will work:

    import sys
    sys.path = sys.path[1:] + sys.path[:1]
    import transformers