I'm following this tutorial on training a causal language model from scratch.
In the tutorial they load the standard GPT2 as follows:
from transformers import AutoTokenizer, GPT2LMHeadModel, AutoConfig
config = AutoConfig.from_pretrained(
"gpt2",
vocab_size=len(tokenizer),
n_ctx=context_length,
bos_token_id=tokenizer.bos_token_id,
eos_token_id=tokenizer.eos_token_id,
)
model = GPT2LMHeadModel(config)
How can I load the same model, but use my custom fully connected network instead of the standard one? Mainly want to experiment with variations such as more/less layers, different activation functions, etc.
I found the source code here, but it's very convoluted and I can't figure out how to replace the fully connected parts with a custom ones or what structure the custom one should have in the first place (e.g., input/output size).
Update For example, using a FC network as such:
class FC_model(nn.Module):
def __init__(self):
super(FC_model, self).__init__()
self.fc1 = nn.Linear(768,256)
self.fc2 = nn.Linear(256,256)
self.fc3 = nn.Linear(256,50000)
def forward(self, x):
x = torch.sin(self.fc1(x)) + torch.rand(1)
x = torch.sin(self.fc2(x))
x = self.fc3(x)
return x
I'm assuming by the fully connected network you're referring to the Fully Connected (FC) / Linear layer.
from transformers import AutoTokenizer, GPT2LMHeadModel, AutoConfig, GPT2Config
configuration = GPT2Config()
model = GPT2LMHeadModel(configuration)
print(model)
The above would show you the modules inside the model:
GPT2LMHeadModel(
(transformer): GPT2Model(
(wte): Embedding(50257, 768)
(wpe): Embedding(1024, 768)
(drop): Dropout(p=0.1, inplace=False)
(h): ModuleList(
(0-11): 12 x GPT2Block(
(ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(attn): GPT2Attention(
(c_attn): Conv1D()
(c_proj): Conv1D()
(attn_dropout): Dropout(p=0.1, inplace=False)
(resid_dropout): Dropout(p=0.1, inplace=False)
)
(ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(mlp): GPT2MLP(
(c_fc): Conv1D()
(c_proj): Conv1D()
(act): NewGELUActivation()
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
(ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
)
(lm_head): Linear(in_features=768, out_features=50257, bias=False)
)
You can now access and update the FC layer by:
model.lm_head = nn.Sequential(
nn.Linear(in_features = 768, out_features = 256),
nn.ReLU(inplace = True),
nn.Dropout1d(0.25),
nn.Linear(in_features = 256, out_features = 128)
)
The above is just a sample, you can experiment with different combinations.