I have a model that I was reading from huggingface using the following code:
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto", trust_remote_code=True)
Now I read the model and I did some modifications to the internal layers and added more layers. When I started the training/fine-tuning I get that not everything is on the same model.
Now after more investigations, I found that my custom layers aren't distributed on multi GPUs as the original model. So I need something like device_map="auto"
but after reading the model.
So simply something like
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto", trust_remote_code=True)
model.device_map = "auto"
I found out that there are actually several methods in accelerate
for this. The first one is used to analyze your model and calculate the total amount of available memory that will be occupied by the model:
The second one is used to match your model with the devices:
https://huggingface.co/docs/accelerate/en/package_reference/big_modeling#accelerate.dispatch_model
So basically, in your case, you can use the following code:
from accelerate import dispatch_model, infer_auto_device_map
model = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto", trust_remote_code=True)
***
...
new_model = CustomModel(model)
...
***
device_map_dict = infer_auto_device_map(new_model)
dispatch_model(new_model, device_map_dict)
P.S. This code still needs to be tested on fine-tuning.