[SOLVED] load_state_dict getting random results

load_state_dict getting random results

import os
os.environ["CUDA_VISIBLE_DEVICES"]="0"
import torch
from transformers import AutoTokenizer, AutoConfig, AutoModelForCausalLM, GPT2LMHeadModel
model = AutoModelForCausalLM.from_pretrained(
    "gpt2",
    device_map='auto',
)
model2 = GPT2LMHeadModel(model.config).to("cuda")
model2.load_state_dict(model.state_dict())
tokenizer = AutoTokenizer.from_pretrained("gpt2")

t = tokenizer("hello_world", return_tensors="pt")["input_ids"].to("cuda")
a = model(t).logits
b = model2(t).logits
print(a - b)
print(a)
print(b)

model2 behaves very differently from the model (loss being much higher), but the model structures and parameters are exactly the same. From the output, it looks like something is randomized for model2. Could anyone tell what was going on? I have the "accelerate" package installed.

The config and the parameters are the same. I also checked the forward functions, and there is no difference at all. However, setting model2.transformer.forward = model.transformer.forward, and then the two models would behave the same.

Solution

You need to set the models to eval mode to disable dropout if you want them to produce the same results

import os
os.environ["CUDA_VISIBLE_DEVICES"]="0"
import torch
from transformers import AutoTokenizer, AutoConfig, AutoModelForCausalLM, GPT2LMHeadModel

model = AutoModelForCausalLM.from_pretrained(
    "gpt2",
    device_map='auto',
)

model2 = GPT2LMHeadModel(model.config).to("cuda")
model2.load_state_dict(model.state_dict())

# set to eval
model.eval()
model2.eval()

tokenizer = AutoTokenizer.from_pretrained("gpt2")

t = tokenizer("hello_world", return_tensors="pt")["input_ids"].to("cuda")
a = model(t).logits
b = model2(t).logits

assert (a == b).all()