machine-learningreinforcement-learningvowpalwabbit

How to understand the slots in the vw.format - Vowpal Wabbit Conditional Contextual Bandit


I am trying to build a contextual bandit. Since I like to rank the actions, I want to switch to an conditional contextual bandit (as I have read here).

But now I have trouble understanding the new vw format.

The example of the vowpal wabbit wiki is this:

ccb shared | s_1 s_2
ccb action | a:1 b:1 c:1
ccb action | a:0.5 b:2 c:1
ccb action | a:0.5 
ccb action | c:1
ccb slot  | d:4
ccb slot 1:0.8:0.8,0:0.2 0,1,3 | d:7

Unfortunatly I do not underhstand the Slot part. I got that this tells the cost and probability for the chosen action. ccb slot 1:0.8:0.8,0:0.2 0,1,3 Is it possible to have more than one chosen action?

I also do not understand why it needs features for the slot part? Furthermore i do not fully understand why we have to tell it the action ids to include? What is the purpose of it? Also which format does it need for the prediction? Why does it need the slot part if I do not have any action costs yet?

Edit: I looked into the azure docs since Vowpal Wabbit has been developed by MS Research. I think I found useful information therr. As soon as I have found the answers, I will post them here.


Solution

  • You may interested in looking into VW's wiki page which has some information on CCB:

    https://github.com/VowpalWabbit/vowpal_wabbit/wiki/Conditional-Contextual-Bandit

    You can think of CCB as a layer above CB, which runs a separate CB example with all actions for each slot, however actions are excluded as a slot selects it. The example above likely uses more functionality than is necessary for your purposes. If you only want to rank actions, this would be a better format:

    ccb shared | s_1 s_2
    ccb action | a:1 b:1 c:1
    ccb action | a:0.5 b:2 c:1
    ccb action | a:0.5
    ccb action | c:1
    ccb slot 1:0.8:0.8 |
    ccb slot 0:0.8:0.8 |
    ccb slot 3:0.8:0.8 |
    ccb slot 2:0.8:0.8 |

    This example could be used for learning and prediction, however if you just wanted to do prediction you could use unlabeled slots like this:

    ccb shared | s_1 s_2
    ccb action | a:1 b:1 c:1
    ccb action | a:0.5 b:2 c:1
    ccb action | a:0.5
    ccb action | c:1
    ccb slot |
    ccb slot |
    ccb slot |
    ccb slot |

    I'll answer each of your questions separately:

    I also do not understand why it needs features for the slot part?
    Slots do not need features. They are allowed to use features if you want each slot to learn differently, however this is not a requirement. If you are trying rank actions you probably don't want slot-specific features.

    Furthermore i do not fully understand why we have to tell it the action ids to include?
    You don't need to do this. By default all actions which have not yet been selected will be included in each slot. As an example, if there are actions 0,1,2,3 and slot 0 select action 1, then slot 1 will have actions 1,2,3 available. In this sense, each later action will not include actions select from prior slots, thus the action that each slot selects will rank the actions in order.

    What is the purpose of it?
    If you wanted a more complicated system with specialized slots you may want to explicitly exclude certain actions from a slot, but for simply ranking actions you will not want to do this.

    Also which format does it need for the prediction?
    Predictions can be done on any CCB example (labeled or unlabeled). You don't need to do anything special with the example, you only have to specify the -p pred_file.txt flag to output predictions to that file.

    Why does it need the slot part if I do not have any action costs yet?
    If you have no costs (so you are only doing prediction), the number of slots will represent the number of predictions you want to make. If you only want to find the top n actions you could use only n slots. Let me know if you have any other questions.