How can I measure gender or racial bias in a transformer-based language model?

I'm trying to measure biases in out of box transformer-based models using Python. I tried using transformers and mlm-bias libraries for bert-base-uncased on Hugging Face but couldn't get it to work using the code below for a pre-trained model (python3.8)

Also is there any way to measure biases for models that were fine-tuned with masked language modeling objective specifically?

from transformers import AutoModel
import mlm_bias

model = AutoModel.from_pretrained('bert-base-uncased')

cps_dataset = mlm_bias.BiasBenchmarkDataset("cps")
cps_dataset.sample(indices=list(range(10)))

mlm_bias = mlm_bias.BiasMLM(model, cps_dataset)

error:

HFValidationError: Repo id must use alphanumeric chars or '-', '_', '.', '--' and '..' are forbidden, '-' and '.' cannot start or end the name, max length is 96: 'BertModel(
...

and then

OSError: Incorrect path_or_model_id: 'BertModel(
...

Solution

The mlm-bias package should work for evaluating biases in pretrained MLMs available through HuggingFace along with fine-tuned/retrained MLMs. You can also compute the relative bias between two MLMs, or evaluate retrained MLMs versus their pretrained base.

You can use the package to compute bias scores across various bias types (gender, racial, socioeconomic, etc.) with benchmark datasets like CrowS-Pairs (CPS) and StereoSet (SS) (intrasentence) or custom datasets.

After installing it with !pip install mlm-bias, the following code works for me (Python 3.10):

import mlm_bias

# load sample from the CrowS-Pairs (CPS) benchmark dataset
cps_dataset = mlm_bias.BiasBenchmarkDataset("cps")
cps_dataset.sample(indices=list(range(10)))

# specify the model name or path
model = "bert-base-uncased"

# initialize the BiasMLM evaluator
mlm_bias = mlm_bias.BiasMLM(model, cps_dataset)

# evaluate the model
result = mlm_bias.evaluate(inc_attention=True)

# save the results
result.save("./bert-base-uncased-results")

# print the bias scores
print(result.bias_scores)

# print the eval results
print(result.eval_results)

For examples on how to load custom or locally saved models, check out the Hugging Face documentation.