rnnet

Using NNET for classification


I am new to neural networks and I have a question about classification with the nnet package.

I have data which is a mixture of numeric and categoric variables. I wanted to make a win lose prediction by using nnet and a function call such as

nnet(WL~., data=training, size=10) 

but this gives a different result than if I use a dataframe with only numeric versions of the variables (i.e. convert all the factors to numeric (except my prediction WL)).

Can someone explain to me what is happening here? I guess nnet is interpreting the variables different but I would like to understand what is happening. I appreciate its difficult without any data to recreate the problem but I am just looking at a high level explanation of how neural networks are fitted using nnet. I cant find this anywhere. Many thanks.

str(training)
'data.frame':   1346 obs. of  9 variables:
 $ WL                   : Factor w/ 2 levels "win","lose": 2 2 1 1 NA 1 1 2 2 2 ...
 $ team.rank            : int  17 19 19 18 17 16 15 14 14 16 ...
 $ opponent.rank        : int  14 12 36 16 12 30 11 38 27 31 ...
 $ HA                   : Factor w/ 2 levels "A","H": 1 1 2 2 2 2 2 1 1 2 ...
 $ comp.stage           : Factor w/ 3 levels "final","KO","league": 3 3 3 3 3 3 3 3 3 3 ...
 $ days.since.last.match: num  132 9 5 7 14 7 7 7 14 7 ...
 $ days.to.next.match   : num  9 5 7 14 7 9 7 9 7 8 ...
 $ comp.last.match      : Factor w/ 5 levels "Anglo-Welsh Cup",..: 5 5 5 5 5 5 3 5 3 5 ...
 $ comp.next.match      : Factor w/ 4 levels "Anglo-Welsh Cup",..: 4 4 4 4 4 3 4 3 4 3 ...

vs

str(training.nnet)
'data.frame':   1346 obs. of  9 variables:
 $ WL                   : Factor w/ 2 levels "win","lose": 2 2 1 1 NA 1 1 2 2 2 ...
 $ team.rank            : int  17 19 19 18 17 16 15 14 14 16 ...
 $ opponent.rank        : int  14 12 36 16 12 30 11 38 27 31 ...
 $ HA                   : num  1 1 2 2 2 2 2 1 1 2 ...
 $ comp.stage           : num  3 3 3 3 3 3 3 3 3 3 ...
 $ days.since.last.match: num  132 9 5 7 14 7 7 7 14 7 ...
 $ days.to.next.match   : num  9 5 7 14 7 9 7 9 7 8 ...
 $ comp.last.match      : num  5 5 5 5 5 5 3 5 3 5 ...
 $ comp.next.match      : num  4 4 4 4 4 3 4 3 4 3 ...

Solution

  • The difference you are looking for can be explained with a very small example:

    fit.factors <- nnet(y ~ x, data.frame(y=c('W', 'L', 'W'), x=c('1', '2' , '3')), size=1)
    fit.factors
    # a 2-1-1 network with 5 weights
    # inputs: x2 x3 
    # output(s): y 
    # options were - entropy fitting 
    
    fit.numeric <- nnet(y ~ x, data.frame(y=c('W', 'L', 'W'), x=c(1, 2, 3)), size=1)
    fit.numeric
    # a 1-1-1 network with 4 weights
    # inputs: x 
    # output(s): y 
    # options were - entropy fitting 
    

    While fitting models in R, the factor variables are actually split out into several indicator/dummy variables.

    Hence, a factor variable x = c('1', '2', '3') actually is split into three variables: x1, x2, x3, one of which holds the value 1 while others hold the value 0. Moreover, since the factors {1, 2, 3} are exhaustive, one (and only one) of x1, x2, x3 must be one. Hence, variables x1, x2, x3 are not independent since x1 + x2 + x3 = 1. So we can drop the first variable x1 and keep only values of x2 and x3 in the model and conclude that the level is 1 if both x2 == 0 and x2 == 0.

    That is what you see in the output of nnet; when x is a factor, there are actually length(levels(x)) - 1 inputs to the neural network and if x is a number, then there is only one input to the neural network which is x.

    Most R regression functions (nnet, randomForest, glm, gbm, etc.) do this mapping from a factor level to dummy variables internally and one doesn't need to be aware of it as a user.


    Now it should be clear what is the difference between using a dataset with factors and a dataset with numbers replacing the factors. If you do the conversion to numbers, then you are:

    1. Losing the unique properties of each level and quantising the difference between them.
    2. Enforcing an ordering between the levels

    This does result in a slightly simpler model (with fewer variables as we do not need dummy variables for each level), but is often not the correct thing to do.