vowpalwabbit

How does cb_adf algorithm know a new action is available in the data if no feature is associated with arms?


From the documentations I have read, cb_adf format multiline data is suitable for scenarios where number of actions are changing over time. My questions is, how does the algorithm know if a new action is available? Is code like formatting the logged bandits data correct?

two_actions = """
shared | a:0.5 b:1 c:2
0:-0.1:0.75 |
|
"""

and

three_actions_now = """
shared | a:0.5 b:1 c:2
|
0:-0.3:0.55 |
|
"""

And what about if one action is no longer available?


Solution

  • In this case you should use some identity feature for the arms which have no other features, this is because for cb_adf the actions themselves are essentially defined as the set of their features.

    shared | a:0.5 b:1 c:2
    | action_1
    0:-0.3:0.55 | action_2
    | action_3
    

    If the action is no longer available you would omit the line that corresponded to that feature. So, if we wished to remove action_2 from the pool of actions to be chosen from it might look like.

    shared | a:0.5 b:1 c:2
    | action_1
    | action_3
    

    cb_adf works best when there is more than just a single feature per action. For example, having features shared across actions allows the learner to learn the value of other features from rewards on other actions.