pythonhuggingface-transformershuggingface-tokenizers

AutoTokenizer.from_pretrained took forever to load


I used the following code to load my custom-trained tokenizer:

from transformers import AutoTokenizer
test_tokenizer = AutoTokenizer.from_pretrained('raptorkwok/cantonese-tokenizer-test')

It took forever to load. Even if I replace the AutoTokenizer with PreTrainedTokenizerFast, it still loads forever.

How to debug or fix this issue?


Solution

  • The problem is resolved when downgrading transformers version to 4.28.1 from 4.41.0. Both pipeline() and from_pretrained() load the tokenizer successfully in seconds.