I am trying to replicates the code from this page.
At my workplace we have access to transformers and pytorch library but cannot connect to internet from our python environment. Could anyone help with how we could get the script working after manually downloading files to my machine?
my specific questions are -
should I go to the location bert-base-uncased at main and download all the files? Do I have put them in a folder with a specific name?
How should I change the below code
# Load pre-trained model tokenizer (vocabulary)
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
# Tokenize our sentence with the BERT tokenizer.
tokenized_text = tokenizer.tokenize(marked_text)
How should I change the below code
# Load pre-trained model (weights)
model = BertModel.from_pretrained('bert-base-uncased',
output_hidden_states = True, # Whether the model returns all hidden-states.
)
Please let me know if anyone has done this…thanks
###update1
I went to the link and manually downloaded all files to a folder and specified path of that folder in my code. Tokenizer works but this line model = BertModel.from_pretrained('bert-base-uncased', output_hidden_states = True, # Whether the model returns all hidden-states. )
fails. Any idea what should i do? I noticed that 4 big files when downloaded have very strange name...should I rename them to same names as shown on the above page? Do I need to download any other files?
the error message is OSErrr: unable to load weights from pytorch checkpoint file for bert-base-uncased2/ at bert-base-uncased/pytorch_model.bin If you tried to load a pytroch model from a TF 2 checkpoint, please set from_tf=True
clone the model repo for downloading all the files
git lfs install
git clone https://huggingface.co/bert-base-uncased
# if you want to clone without large files – just their pointers
# prepend your git clone with the following env var:
GIT_LFS_SKIP_SMUDGE=1
git usage:
download git from here https://git-scm.com/downloads
paste these to your cli(terminal):
a. git lfs install
b. git clone https://huggingface.co/bert-base-uncased
wait for download, it will take time. if you want monitor your web performance
find the current directory simply pasting cd to your cli and get the file path(e.g "C:/Users/........./bert-base-uncased" )
use it as:
from transformers import BertModel, BertTokenizer
model = BertModel.from_pretrained("C:/Users/........./bert-base-uncased")
tokenizer = BertTokenizer.from_pretrained("C:/Users/........./bert-base-uncased")
Manual download, without git:
Download all the files from here https://huggingface.co/bert-base-uncased/tree/main
Put them in a folder named "yourfoldername"
use it as:
model = BertModel.from_pretrained("C:/Users/........./yourfoldername")
tokenizer = BertTokenizer.from_pretrained("C:/Users/........./yourfoldername")
For only model(manual download, without git):
just click the download button here and download only pytorch pretrained model. its about 420mb https://huggingface.co/bert-base-uncased/blob/main/pytorch_model.bin
download config.json file from here https://huggingface.co/bert-base-uncased/tree/main
put both of them in a folder named "yourfilename"
use it as:
model = BertModel.from_pretrained("C:/Users/........./yourfilename")