android-studiopython-3.8chaquopygpt-2

What is the cause of HFValidationError in this code and how do I resolve this error?


My python code in Chaquopy android studio Project:

import torch as tc
from transformers import GPT2Tokenizer, GPT2Model



def generate_text(txt):
    """
    Generate chat
    https://huggingface.co/gpt2
    """

    #Load Model files
    tokenizer = GPT2Tokenizer.from_pretrained('assets/') #This line causing error
    model = GPT2Model.from_pretrained('assets/')
    #Move moel to GPU if avilable
    device = tc.device("cuda" if tc.cuda.is_available() else "cpu")
    model.to(device)

    encoded_input = tokenizer(txt, return_tensors='pt')
    output = model(**encoded_input)

    return str(output)

Now it is showing following error :

E/AndroidRuntime: FATAL EXCEPTION: main
    Process: com.example.chaquopy_130application, PID: 4867
    com.chaquo.python.PyException: HFValidationError: Repo id must use alphanumeric chars or '-', '_', '.', '--' and '..' are forbidden, '-' and '.' cannot start or end the name, max length is 96: 'assets/'.
        at <python>.huggingface_hub.utils._validators.validate_repo_id(_validators.py:164)
        at <python>.huggingface_hub.utils._validators._inner_fn(_validators.py:110)
        at <python>.huggingface_hub.utils._deprecation.inner_f(_deprecation.py:103)
        at <python>.transformers.file_utils.get_list_of_files(file_utils.py:2103)
        at <python>.transformers.tokenization_utils_base.get_fast_tokenizer_file(tokenization_utils_base.py:3486)
        at <python>.transformers.tokenization_utils_base.from_pretrained(tokenization_utils_base.py:1654)
        at <python>.pythonScript.generate_text(pythonScript.py:30)

I have put all files of 124M GPT-2 model checkpoint, encoder.json, hparams.json, model.ckpt.data-00000-of-00001, model.ckpt.index, model.ckpt.meta, vocab.bpe files inside of 'assets' folder.

file structure


Solution

  • The from_pretrained documentation is not entirely clear about how it distinguishes huggingface repository names from local paths, although all the local path examples end with a slash.

    In any case, when loading data files with Chaquopy, you must always use absolute paths, as it says in the FAQ. So assuming your "assets" directory is at the same level as the Python code, you can do this:

    from os.path import dirname
    tokenizer = GPT2Tokenizer.from_pretrained(f'{dirname(__file__)}/assets/')