I try to do the settings for a sweep for my Logistic regression model. I read the tutorials of wandb and cannot understand how to make the configurations and especially the meaning of config=wandb.config
in the tutorials. I would really appreciate it if someone gave me a good explanation of the steps. Here is what I've done:
sweep_config = {
'method': 'grid'
}
metric = {
'name': 'f1-score',
'goal': 'maximize'
}
sweep_config['metric'] = metric
parameters = {
'penalty': {
'values': ['l2']
},
'C': {
'values': [0.01, 0.1, 1.0, 10.0, 100.0]
}
}
sweep_config['parameters'] = parameters
Then I create the yaml file:
stream = open('config.yaml', 'w')
yaml.dump(sweep_config, stream)
Then it's time for training:
with wandb.init(project=WANDB_PROJECT_NAME):
config = wandb.config
features = pd.read_csv('data/x_features.csv')
vectorizer = TfidfVectorizer(ngram_range=(1,2))
X_features = features = vectorizer.fit_transform(features['lemmatized_reason'])
y_labels = pd.read_csv('data/y_labels.csv')
split_data = train_test_split(X_features, y_labels, train_size = 0.85, test_size = 0.15, stratify=y_labels)
features_train, labels_train = split_data[0], split_data[2]
features_test, labels_test = split_data[1], split_data[3]
config = wandb.config
log_reg = LogisticRegression(
penalty=config.penalty,
C = config.C
)
log_reg.fit(features_train, labels_train)
labels_pred = log_reg.predict(features_test)
labels_proba = log_reg.predict_proba(features_test)
labels=list(map(str,y_labels['label'].unique()))
# Visualize single plot
cm = wandb.sklearn.plot_confusion_matrix(labels_test, labels_pred, labels)
score_f1 = f1_score(labels_test, labels_pred, average='weighted')
sm = wandb.sklearn.plot_summary_metrics(
log_reg, features_train, labels_train, features_test, labels_test)
roc = wandb.sklearn.plot_roc(labels_test, labels_proba)
wandb.log({
"f1-weighted-log-regression-tfidf-skf": score_f1,
"roc-log-regression-tfidf-skf": roc,
"conf-mat-logistic-regression-tfidf-skf": cm,
"summary-metrics-logistic-regression-tfidf-skf": sm
})
And finally sweep_id and agent outside of with
statement:
sweep_id = wandb.sweep(sweep_config, project="multiple-classifiers")
wandb.agent(sweep_id)
There is something major I am missing here with this config thing, that I just cannot understand.
I work at Weights & Biases. With wandb Sweeps, the idea is that wandb needs to be able to change the hyperparameters in the sweep.
The below section where the hyperparameters are passed to LogisticRegression
could also be re-written
config = wandb.config
log_reg = LogisticRegression(
penalty=config.penalty,
C = config.C
)
like this:
log_reg = LogisticRegression(
penalty=wandb.config.penalty,
C = wandb.config.C
)
However, I think you're missing defining a train function or train script, which needs to also be passed to wandb. With out it, your example above won't work.
Below is a minimal example that should help. Hopefully the sweeps documentation can also help.
import numpy as np
import random
import wandb
# 🐝 Step 1: Define sweep config
sweep_configuration = {
'method': 'random',
'name': 'sweep',
'metric': {'goal': 'maximize', 'name': 'val_acc'},
'parameters':
{
'batch_size': {'values': [16, 32, 64]},
'epochs': {'values': [5, 10, 15]},
'lr': {'max': 0.1, 'min': 0.0001}
}
}
# 🐝 Step 2: Initialize sweep by passing in config
sweep_id = wandb.sweep(sweep_configuration)
def train_one_epoch(epoch, lr, bs):
acc = 0.25 + ((epoch/30) + (random.random()/10))
loss = 0.2 + (1 - ((epoch-1)/10 + random.random()/5))
return acc, loss
def evaluate_one_epoch(epoch):
acc = 0.1 + ((epoch/20) + (random.random()/10))
loss = 0.25 + (1 - ((epoch-1)/10 + random.random()/6))
return acc, loss
def train():
run = wandb.init()
# 🐝 Step 3: Use hyperparameter values from `wandb.config`
lr = wandb.config.lr
bs = wandb.config.batch_size
epochs = wandb.config.epochs
for epoch in np.arange(1, epochs):
train_acc, train_loss = train_one_epoch(epoch, lr, bs)
val_acc, val_loss = evaluate_one_epoch(epoch)
wandb.log({
'epoch': epoch,
'train_acc': train_acc,
'train_loss': train_loss,
'val_acc': val_acc,
'val_loss': val_loss
})
# 🐝 Step 4: Launch sweep by making a call to `wandb.agent`
wandb.agent(sweep_id, function=train, count=4)
Finally, can you share the link where you found the code above? Maybe we need to update some examples :)