pythonpytorchhuggingface-transformerspre-trained-modelgpt-2

load_state_dict getting random results


import os
os.environ["CUDA_VISIBLE_DEVICES"]="0"
import torch
from transformers import AutoTokenizer, AutoConfig, AutoModelForCausalLM, GPT2LMHeadModel
model = AutoModelForCausalLM.from_pretrained(
    "gpt2",
    device_map='auto',
)
model2 = GPT2LMHeadModel(model.config).to("cuda")
model2.load_state_dict(model.state_dict())
tokenizer = AutoTokenizer.from_pretrained("gpt2")

t = tokenizer("hello_world", return_tensors="pt")["input_ids"].to("cuda")
a = model(t).logits
b = model2(t).logits
print(a - b)
print(a)
print(b)

model2 behaves very differently from the model (loss being much higher), but the model structures and parameters are exactly the same. From the output, it looks like something is randomized for model2. Could anyone tell what was going on? I have the "accelerate" package installed.

The config and the parameters are the same. I also checked the forward functions, and there is no difference at all. However, setting model2.transformer.forward = model.transformer.forward, and then the two models would behave the same.


Solution

  • You need to set the models to eval mode to disable dropout if you want them to produce the same results

    import os
    os.environ["CUDA_VISIBLE_DEVICES"]="0"
    import torch
    from transformers import AutoTokenizer, AutoConfig, AutoModelForCausalLM, GPT2LMHeadModel
    
    model = AutoModelForCausalLM.from_pretrained(
        "gpt2",
        device_map='auto',
    )
    
    model2 = GPT2LMHeadModel(model.config).to("cuda")
    model2.load_state_dict(model.state_dict())
    
    # set to eval
    model.eval()
    model2.eval()
    
    tokenizer = AutoTokenizer.from_pretrained("gpt2")
    
    t = tokenizer("hello_world", return_tensors="pt")["input_ids"].to("cuda")
    a = model(t).logits
    b = model2(t).logits
    
    assert (a == b).all()