pythonmachine-learningoptimizationhyperparametersoptuna

Why Optuna even suggests to take a log of a parameter?


In the official Optuna tutorial there is an example of using of log=True parameter of trial.suggest_int:

import torch
import torch.nn as nn


def create_model(trial, in_size):
    n_layers = trial.suggest_int("n_layers", 1, 3)

    layers = []
    for i in range(n_layers):
        n_units = trial.suggest_int("n_units_l{}".format(i), 4, 128, log=True)
        layers.append(nn.Linear(in_size, n_units))
        layers.append(nn.ReLU())
        in_size = n_units
    layers.append(nn.Linear(in_size, 10))

    return nn.Sequential(*layers)

Why would someone take a logarithm of number of neurons? There are also other instances of (IMO) redundant usage of log=True in the tutorial. Could someone explain their motivation, please?


Solution

  • In your example, with values in [4, 128], setting log=True chooses a real number uniformly from [log(4), log(128)]=[2,7], then exponentiates the result, and finally rounds to an integer. This has the effect of making smaller values more likely. For example, the range [4,8] is equally probable as [64,128].

    From the docs:

    If log is true, at first, the range of suggested values is divided into grid points of width 1. The range of suggested values is then converted to a log domain, from which a value is sampled. The uniformly sampled value is re-converted to the original domain and rounded to the nearest grid point that we just split, and the suggested value is determined. For example, if low = 2 and high = 8, then the range of suggested values is [2, 3, 4, 5, 6, 7, 8] and lower values tend to be more sampled than higher values.