pythonwandb

What is the meaning of 'config = wandb.config'?


I try to do the settings for a sweep for my Logistic regression model. I read the tutorials of wandb and cannot understand how to make the configurations and especially the meaning of config=wandb.config in the tutorials. I would really appreciate it if someone gave me a good explanation of the steps. Here is what I've done:

sweep_config = {
    'method': 'grid'
}

metric = {
    'name': 'f1-score',
    'goal': 'maximize'
}

sweep_config['metric'] = metric

parameters = {
    'penalty': {
        'values': ['l2']
    },
    'C': {
        'values': [0.01, 0.1, 1.0, 10.0, 100.0]
    }
}

sweep_config['parameters'] = parameters

Then I create the yaml file:

stream = open('config.yaml', 'w')
yaml.dump(sweep_config, stream) 

Then it's time for training:

with wandb.init(project=WANDB_PROJECT_NAME):
    config = wandb.config
    
    features = pd.read_csv('data/x_features.csv')
    vectorizer = TfidfVectorizer(ngram_range=(1,2))

    X_features = features = vectorizer.fit_transform(features['lemmatized_reason'])

    y_labels = pd.read_csv('data/y_labels.csv')

    split_data = train_test_split(X_features, y_labels, train_size = 0.85, test_size = 0.15, stratify=y_labels)
    features_train, labels_train = split_data[0], split_data[2]
    features_test, labels_test = split_data[1], split_data[3]
    
    config = wandb.config
    log_reg = LogisticRegression(
        penalty=config.penalty,
        C = config.C
    )
    
    log_reg.fit(features_train, labels_train)
    
    labels_pred = log_reg.predict(features_test)
    labels_proba = log_reg.predict_proba(features_test)
    labels=list(map(str,y_labels['label'].unique()))
    
    # Visualize single plot
    cm = wandb.sklearn.plot_confusion_matrix(labels_test, labels_pred, labels)
    
    score_f1 = f1_score(labels_test, labels_pred, average='weighted')
    
    sm = wandb.sklearn.plot_summary_metrics(
    log_reg, features_train, labels_train, features_test, labels_test)
    
    roc = wandb.sklearn.plot_roc(labels_test, labels_proba)
    
    wandb.log({
        "f1-weighted-log-regression-tfidf-skf": score_f1, 
        "roc-log-regression-tfidf-skf": roc, 
        "conf-mat-logistic-regression-tfidf-skf": cm,
        "summary-metrics-logistic-regression-tfidf-skf": sm
        })

And finally sweep_id and agent outside of with statement:

sweep_id = wandb.sweep(sweep_config, project="multiple-classifiers")
wandb.agent(sweep_id)

There is something major I am missing here with this config thing, that I just cannot understand.


Solution

  • I work at Weights & Biases. With wandb Sweeps, the idea is that wandb needs to be able to change the hyperparameters in the sweep.

    The below section where the hyperparameters are passed to LogisticRegression could also be re-written

    config = wandb.config
    log_reg = LogisticRegression(
        penalty=config.penalty,
        C = config.C
    )
    

    like this:

    log_reg = LogisticRegression(
        penalty=wandb.config.penalty,
        C = wandb.config.C
    )
    

    However, I think you're missing defining a train function or train script, which needs to also be passed to wandb. With out it, your example above won't work.

    Below is a minimal example that should help. Hopefully the sweeps documentation can also help.

    import numpy as np 
    import random
    import wandb
    
    # 🐝 Step 1: Define sweep config
    sweep_configuration = {
        'method': 'random',
        'name': 'sweep',
        'metric': {'goal': 'maximize', 'name': 'val_acc'},
        'parameters': 
        {
            'batch_size': {'values': [16, 32, 64]},
            'epochs': {'values': [5, 10, 15]},
            'lr': {'max': 0.1, 'min': 0.0001}
         }
    }
    
    # 🐝 Step 2: Initialize sweep by passing in config
    sweep_id = wandb.sweep(sweep_configuration)
    
    def train_one_epoch(epoch, lr, bs): 
      acc = 0.25 + ((epoch/30) +  (random.random()/10))
      loss = 0.2 + (1 - ((epoch-1)/10 +  random.random()/5))
      return acc, loss
    
    def evaluate_one_epoch(epoch): 
      acc = 0.1 + ((epoch/20) +  (random.random()/10))
      loss = 0.25 + (1 - ((epoch-1)/10 +  random.random()/6))
      return acc, loss
    
    def train():
        run = wandb.init()
    
        # 🐝 Step 3: Use hyperparameter values from `wandb.config`
        lr  =  wandb.config.lr
        bs = wandb.config.batch_size
        epochs = wandb.config.epochs
    
        for epoch in np.arange(1, epochs):
          train_acc, train_loss = train_one_epoch(epoch, lr, bs)
          val_acc, val_loss = evaluate_one_epoch(epoch)
    
          wandb.log({
            'epoch': epoch, 
            'train_acc': train_acc,
            'train_loss': train_loss, 
            'val_acc': val_acc, 
            'val_loss': val_loss
          })
    
    # 🐝 Step 4: Launch sweep by making a call to `wandb.agent`
    wandb.agent(sweep_id, function=train, count=4)
    

    Finally, can you share the link where you found the code above? Maybe we need to update some examples :)