rindexingtorch

In Torch for R, fitting a Dataloader object yields "Indexing starts at 1 but found a 0" error


I keep getting the error "Indexing starts at 1 but found a 0" when fitting a model where the input or output started life as R objects (vectors, matrices, or arrays), but only when using a dataloader instead of the dataset itself. Here's a toy example:


library(torch)
library(luz) # For fit() function

x <- torch_randn(1, 9)
y <- torch_tensor(as.integer(c(0,0,1)))
# y <- torch_tensor(as.integer(1)) # This doesn't give an error if out_features=1 below

xy.ds <- tensor_dataset(x,y)
xy.dl <- dataloader(xy.ds, batch_size = 1)

# Create one-layer linear model

linnet <- nn_module(
  initialize = function() {
    self$fc <- nn_linear(in_features = 9, out_features = 3)
  },
  forward = function(x) {
    self$fc(x)
  }
)

fitted <- linnet %>%
  setup(
   loss = nn_cross_entropy_loss(),
   optimizer = optim_adam
  )  %>%
  fit(xy.dl, epochs = 1) # Error in to_index_tensor(target) : Indexing starts at 1 but found a 0.
  # fit(xy.ds, epochs = 1) # This doesn't given an error

Maybe this is some sort of bug in the Python-to-R port. Maybe my computer is having trouble accessing some library (I'm using the R GUI on Windows 11, with the currently newest versions of R, torch, and lux). Maybe the problem relates to nn_cross_entropy_loss() (which I need for my real project, which uses R-created arrays and nn_conv2d()). Or maybe I'm just being stupid. In any case, I can't even search for help on the function that's complaining: to_index_tensor().


Solution

  • CrossEntropyLoss & zeros as targets

    torch in R should always use 1-based indices, CrossEntropyLoss targets in this configuration are index tensors (to represent discrete classes like MNIST digits or as.factor(c("cat", "dog")) ) and are explicitly tested for not having any zeros. to_index_tensor():

    // [[Rcpp::export]]
    XPtrTorchTensor to_index_tensor(XPtrTorchTensor t) {
      // check that there's no zeros
      bool zeros = lantern_Tensor_has_any_zeros(t.get());
      if (zeros) {
        Rcpp::stop("Indexing starts at 1 but found a 0.");
      }
    ...
    

    , called from internal torch_cross_entropy_loss():

    torch:::torch_cross_entropy_loss
    #> function (self, target, weight = list(), reduction = torch_reduction_mean(), 
    #>     ignore_index = -100L) 
    #> {
    #>     target <- to_index_tensor(target)
    #>     .torch_cross_entropy_loss(self = self, target = target, weight = weight, 
    #>         reduction = reduction, ignore_index = ignore_index)
    #> }
    

    We can test this with 0 & 1 targets, without going through dataloader & luz:

    library(torch)
    x <- torch_rand(1,9)
    
    nnf_cross_entropy(x, target = torch_tensor(0L))
    #> Error in `to_index_tensor()`:
    #> ! Indexing starts at 1 but found a 0.
    #> Backtrace:
    #>     ▆
    #>  1. └─torch::nnf_cross_entropy(x, target = torch_tensor(0L))
    #>  2.   └─torch:::torch_cross_entropy_loss(...)
    #>  3.     └─torch:::to_index_tensor(target)
    
    nnf_cross_entropy(x, target = torch_tensor(1L))
    #> torch_tensor
    #> 0.965333
    #> [ CPUFloatType{} ]
    

    And now original example, 1 added to labels to avoid zeros:

    library(torch)
    library(luz) 
    
    (x <- torch_randn(1, 9))
    #> torch_tensor
    #>  0.4351 -0.3599  0.5954  0.3883 -1.3409 -0.7897  0.1219 -1.2122  2.5252
    #> [ CPUFloatType{1,9} ]
    
    # add +1 to target classes to avoid "found a 0" error when using CrossEntropyLoss 
    (y <- torch_tensor(as.integer(c(0, 0, 1) + 1)))
    #> torch_tensor
    #>  1
    #>  1
    #>  2
    #> [ CPULongType{3} ]
    
    xy.ds <- tensor_dataset(x,y)
    xy.dl <- dataloader(xy.ds, batch_size = 1)
    
    linnet <- nn_module(
      initialize = function() {
        self$fc <- nn_linear(in_features = 9, out_features = 3)
      },
      forward = function(x) {
        self$fc(x)
      }
    )
    
    fitted <- linnet %>%
      setup(
        loss = nn_cross_entropy_loss(),
        optimizer = optim_adam
      )  %>%
      fit(xy.dl, epochs = 1) 
    fitted
    #> A `luz_module_fitted`
    #> ── Time ────────────────────────────────────────────────────────────────────────
    #> • Total time: 172ms
    #> • Avg time per training epoch: 165ms
    #> 
    #> ── Results ─────────────────────────────────────────────────────────────────────
    #> Metrics observed in the last epoch.
    #> 
    #> ℹ Training:
    #> loss: 2.366
    #> 
    #> ── Model ───────────────────────────────────────────────────────────────────────
    #> An `nn_module` containing 30 parameters.
    #> 
    #> ── Modules ─────────────────────────────────────────────────────────────────────
    #> • fc: <nn_linear> #30 parameters
    

    Testing dataloaders

    You may also want to check if resulting dataset and values from dataloader are as expected, currently only 1st y value is actually used:

    # test dataset & dl iterator (edited without fixed seed):
    xy.ds$.getitem(1)
    #> [[1]]
    #> torch_tensor
    #>  0.0689
    #>  0.3410
    #>  0.1130
    #> -0.0152
    #>  0.2865
    #>  0.4140
    #>  1.4283
    #>  0.4356
    #>  0.3323
    #> [ CPUFloatType{9} ]
    #> 
    #> [[2]]
    #> torch_tensor
    #> 1
    #> [ CPULongType{} ]
    xy.dl$.iter()$.next()
    #> [[1]]
    #> torch_tensor
    #>  0.0689  0.3410  0.1130 -0.0152  0.2865  0.4140  1.4283  0.4356  0.3323
    #> [ CPUFloatType{1,9} ]
    #> 
    #> [[2]]
    #> torch_tensor
    #>  1
    #> [ CPULongType{1} ]
    

    This is because of dimensionality, dataset is built by combining

    1st iteration returns 1st observations. And also exhausts dataloader as there's nothing else to pull from x. So you probably want to make sure tensor_dataset() input tensors are properly shaped to match your network architecture and loss function, through input R object shapes:

    torch_tensor(matrix(as.integer(c(0, 0, 1) + 1), ncol = 3, byrow = TRUE))
    #> torch_tensor
    #>  1  1  2
    #> [ CPULongType{1,3} ]
    

    And/or by reshaping tensors:

    y$reshape(c(1,3))
    #> torch_tensor
    #>  1  1  2
    #> [ CPULongType{1,3} ]
    

    Now we can have a dataloder that returns target batch shaped as {1,3} :

    ds_ <- tensor_dataset(x,y$reshape(c(1,3)))
    dl_ <- dataloader(ds_, batch_size = 1)
    dl_$.iter()$.next()
    #> [[1]]
    #> torch_tensor
    #>  0.9468 -0.7732  0.7157 -0.6058 -1.7488 -0.7722 -0.0210 -1.6659  0.8572
    #> [ CPUFloatType{1,9} ]
    #> 
    #> [[2]]
    #> torch_tensor
    #>  1  1  2
    #> [ CPULongType{1,3} ]
    

    Just note that this dataloader will not work with linnet in this example as target shape / type does not match with CrossEntropyLoss target expectations anymore.


    Why there's no error with fit(..., data = <dataset>) ?

    fit(xy.ds, epochs = 1) # This doesn't given an error

    According to ?fit.luz_module_generator it should work just fine with different data types (dataloader, dataset, list); internally luz converts datasets & lists to dataloader, but unless setting options ( fit(xy.ds, dataloader_options = list(...)) ), it uses different defaults than torch::dataloader():
    batch_size = 32, shuffle = TRUE, drop_last = TRUE .

    In this case it results with an empty dataloader:

    luz:::apply_dataloader_options(xy.ds, valid_data = NULL, dataloader_options = NULL)
    #> [[1]]
    #> <dataloader>
    #>   Public:
    #> 
    #> [[2]]
    #> NULL
    

    and training loop exits before first train batch starts, so loss function is not called and there's no "Indexing starts at 1 but found a 0" error.