From the documentations I have read, cb_adf format multiline data is suitable for scenarios where number of actions are changing over time. My questions is, how does the algorithm know if a new action is available? Is code like formatting the logged bandits data correct?
two_actions = """
shared | a:0.5 b:1 c:2
0:-0.1:0.75 |
|
"""
and
three_actions_now = """
shared | a:0.5 b:1 c:2
|
0:-0.3:0.55 |
|
"""
And what about if one action is no longer available?
In this case you should use some identity feature for the arms which have no other features, this is because for cb_adf the actions themselves are essentially defined as the set of their features.
shared | a:0.5 b:1 c:2
| action_1
0:-0.3:0.55 | action_2
| action_3
If the action is no longer available you would omit the line that corresponded to that feature. So, if we wished to remove action_2 from the pool of actions to be chosen from it might look like.
shared | a:0.5 b:1 c:2
| action_1
| action_3
cb_adf works best when there is more than just a single feature per action. For example, having features shared across actions allows the learner to learn the value of other features from rewards on other actions.