vowpalwabbit

Vowpal Wabbit: limits on the size of the action set for Contextual Bandits?


For the contextual bandit framework of Vowpal Wabbit, are there any limits on how large the number of actions can be? I'm assuming that currently there is no support for problems with an infinity-sized action set (e.g. an l2 ball in Rn). But are there any limits on how large a finite set of actions can be? Or is that limited only by the hardware the library runs on?

What I can think of in terms of potential problems/concerns are floating point errors (for example for predicting the PMF over the set of actions), slow predictions/updates, and specific exploration policies/policy evaluation approaches not playing well with a large action space.

Edit: number of actions I'm considering is in the range of 1000-100,000


Solution

  • I'm assuming that currently there is no support for problems with an infinity-sized action set

    Correct, this isn't supported at the moment.

    But are there any limits on how large a finite set of actions can be? Or is that limited only by the hardware the library runs on?

    I don't believe there are concrete/artificial limits for action set size, so hardware is probably the limit. Internally, the action ID is a 32 bit number so there is definitely a limit at 2^32. As far as other issues, if you face anything like that please feel free to open an issue and we can work with you to get them sorted out. It's definitely the kind of thing that should be fixed.