My python code in Chaquopy android studio Project:
import torch as tc
from transformers import GPT2Tokenizer, GPT2Model
def generate_text(txt):
"""
Generate chat
https://huggingface.co/gpt2
"""
#Load Model files
tokenizer = GPT2Tokenizer.from_pretrained('assets/') #This line causing error
model = GPT2Model.from_pretrained('assets/')
#Move moel to GPU if avilable
device = tc.device("cuda" if tc.cuda.is_available() else "cpu")
model.to(device)
encoded_input = tokenizer(txt, return_tensors='pt')
output = model(**encoded_input)
return str(output)
Now it is showing following error :
E/AndroidRuntime: FATAL EXCEPTION: main
Process: com.example.chaquopy_130application, PID: 4867
com.chaquo.python.PyException: HFValidationError: Repo id must use alphanumeric chars or '-', '_', '.', '--' and '..' are forbidden, '-' and '.' cannot start or end the name, max length is 96: 'assets/'.
at <python>.huggingface_hub.utils._validators.validate_repo_id(_validators.py:164)
at <python>.huggingface_hub.utils._validators._inner_fn(_validators.py:110)
at <python>.huggingface_hub.utils._deprecation.inner_f(_deprecation.py:103)
at <python>.transformers.file_utils.get_list_of_files(file_utils.py:2103)
at <python>.transformers.tokenization_utils_base.get_fast_tokenizer_file(tokenization_utils_base.py:3486)
at <python>.transformers.tokenization_utils_base.from_pretrained(tokenization_utils_base.py:1654)
at <python>.pythonScript.generate_text(pythonScript.py:30)
I have put all files of 124M GPT-2 model checkpoint, encoder.json, hparams.json, model.ckpt.data-00000-of-00001, model.ckpt.index, model.ckpt.meta, vocab.bpe files inside of 'assets' folder.
The from_pretrained
documentation is not entirely clear about how it distinguishes huggingface repository names from local paths, although all the local path examples end with a slash.
In any case, when loading data files with Chaquopy, you must always use absolute paths, as it says in the FAQ. So assuming your "assets" directory is at the same level as the Python code, you can do this:
from os.path import dirname
tokenizer = GPT2Tokenizer.from_pretrained(f'{dirname(__file__)}/assets/')