In Torch for R, fitting a Dataloader object yields "Indexing starts at 1 but found a 0" error

I keep getting the error "Indexing starts at 1 but found a 0" when fitting a model where the input or output started life as R objects (vectors, matrices, or arrays), but only when using a dataloader instead of the dataset itself. Here's a toy example:


library(torch)
library(luz) # For fit() function

x <- torch_randn(1, 9)
y <- torch_tensor(as.integer(c(0,0,1)))
# y <- torch_tensor(as.integer(1)) # This doesn't give an error if out_features=1 below

xy.ds <- tensor_dataset(x,y)
xy.dl <- dataloader(xy.ds, batch_size = 1)

# Create one-layer linear model

linnet <- nn_module(
  initialize = function() {
    self$fc <- nn_linear(in_features = 9, out_features = 3)
  },
  forward = function(x) {
    self$fc(x)
  }
)

fitted <- linnet %>%
  setup(
   loss = nn_cross_entropy_loss(),
   optimizer = optim_adam
  )  %>%
  fit(xy.dl, epochs = 1) # Error in to_index_tensor(target) : Indexing starts at 1 but found a 0.
  # fit(xy.ds, epochs = 1) # This doesn't given an error

Maybe this is some sort of bug in the Python-to-R port. Maybe my computer is having trouble accessing some library (I'm using the R GUI on Windows 11, with the currently newest versions of R, torch, and lux). Maybe the problem relates to nn_cross_entropy_loss() (which I need for my real project, which uses R-created arrays and nn_conv2d()). Or maybe I'm just being stupid. In any case, I can't even search for help on the function that's complaining: to_index_tensor().

Solution

CrossEntropyLoss & zeros as targets

torch in R should always use 1-based indices, CrossEntropyLoss targets in this configuration are index tensors (to represent discrete classes like MNIST digits or as.factor(c("cat", "dog")) ) and are explicitly tested for not having any zeros. to_index_tensor():

// [[Rcpp::export]]
XPtrTorchTensor to_index_tensor(XPtrTorchTensor t) {
  // check that there's no zeros
  bool zeros = lantern_Tensor_has_any_zeros(t.get());
  if (zeros) {
    Rcpp::stop("Indexing starts at 1 but found a 0.");
  }
...

, called from internal torch_cross_entropy_loss():

torch:::torch_cross_entropy_loss
#> function (self, target, weight = list(), reduction = torch_reduction_mean(), 
#>     ignore_index = -100L) 
#> {
#>     target <- to_index_tensor(target)
#>     .torch_cross_entropy_loss(self = self, target = target, weight = weight, 
#>         reduction = reduction, ignore_index = ignore_index)
#> }

We can test this with 0 & 1 targets, without going through dataloader & luz:

library(torch)
x <- torch_rand(1,9)

nnf_cross_entropy(x, target = torch_tensor(0L))
#> Error in `to_index_tensor()`:
#> ! Indexing starts at 1 but found a 0.
#> Backtrace:
#>     ▆
#>  1. └─torch::nnf_cross_entropy(x, target = torch_tensor(0L))
#>  2.   └─torch:::torch_cross_entropy_loss(...)
#>  3.     └─torch:::to_index_tensor(target)

nnf_cross_entropy(x, target = torch_tensor(1L))
#> torch_tensor
#> 0.965333
#> [ CPUFloatType{} ]

And now original example, 1 added to labels to avoid zeros:

library(torch)
library(luz) 

(x <- torch_randn(1, 9))
#> torch_tensor
#>  0.4351 -0.3599  0.5954  0.3883 -1.3409 -0.7897  0.1219 -1.2122  2.5252
#> [ CPUFloatType{1,9} ]

# add +1 to target classes to avoid "found a 0" error when using CrossEntropyLoss 
(y <- torch_tensor(as.integer(c(0, 0, 1) + 1)))
#> torch_tensor
#>  1
#>  1
#>  2
#> [ CPULongType{3} ]

xy.ds <- tensor_dataset(x,y)
xy.dl <- dataloader(xy.ds, batch_size = 1)

linnet <- nn_module(
  initialize = function() {
    self$fc <- nn_linear(in_features = 9, out_features = 3)
  },
  forward = function(x) {
    self$fc(x)
  }
)

fitted <- linnet %>%
  setup(
    loss = nn_cross_entropy_loss(),
    optimizer = optim_adam
  )  %>%
  fit(xy.dl, epochs = 1) 
fitted
#> A `luz_module_fitted`
#> ── Time ────────────────────────────────────────────────────────────────────────
#> • Total time: 172ms
#> • Avg time per training epoch: 165ms
#> 
#> ── Results ─────────────────────────────────────────────────────────────────────
#> Metrics observed in the last epoch.
#> 
#> ℹ Training:
#> loss: 2.366
#> 
#> ── Model ───────────────────────────────────────────────────────────────────────
#> An `nn_module` containing 30 parameters.
#> 
#> ── Modules ─────────────────────────────────────────────────────────────────────
#> • fc: <nn_linear> #30 parameters

Testing dataloaders

You may also want to check if resulting dataset and values from dataloader are as expected, currently only 1st y value is actually used:

# test dataset & dl iterator (edited without fixed seed):
xy.ds$.getitem(1)
#> [[1]]
#> torch_tensor
#>  0.0689
#>  0.3410
#>  0.1130
#> -0.0152
#>  0.2865
#>  0.4140
#>  1.4283
#>  0.4356
#>  0.3323
#> [ CPUFloatType{9} ]
#> 
#> [[2]]
#> torch_tensor
#> 1
#> [ CPULongType{} ]
xy.dl$.iter()$.next()
#> [[1]]
#> torch_tensor
#>  0.0689  0.3410  0.1130 -0.0152  0.2865  0.4140  1.4283  0.4356  0.3323
#> [ CPUFloatType{1,9} ]
#> 
#> [[2]]
#> torch_tensor
#>  1
#> [ CPULongType{1} ]

This is because of dimensionality, dataset is built by combining

x of {1,9} (one observation) and
y of {3} (three observations)

1st iteration returns 1st observations. And also exhausts dataloader as there's nothing else to pull from x. So you probably want to make sure tensor_dataset() input tensors are properly shaped to match your network architecture and loss function, through input R object shapes:

torch_tensor(matrix(as.integer(c(0, 0, 1) + 1), ncol = 3, byrow = TRUE))
#> torch_tensor
#>  1  1  2
#> [ CPULongType{1,3} ]

And/or by reshaping tensors:

y$reshape(c(1,3))
#> torch_tensor
#>  1  1  2
#> [ CPULongType{1,3} ]

Now we can have a dataloder that returns target batch shaped as {1,3} :

ds_ <- tensor_dataset(x,y$reshape(c(1,3)))
dl_ <- dataloader(ds_, batch_size = 1)
dl_$.iter()$.next()
#> [[1]]
#> torch_tensor
#>  0.9468 -0.7732  0.7157 -0.6058 -1.7488 -0.7722 -0.0210 -1.6659  0.8572
#> [ CPUFloatType{1,9} ]
#> 
#> [[2]]
#> torch_tensor
#>  1  1  2
#> [ CPULongType{1,3} ]

Just note that this dataloader will not work with linnet in this example as target shape / type does not match with CrossEntropyLoss target expectations anymore.

Why there's no error with `fit(..., data = <dataset>)` ?

fit(xy.ds, epochs = 1) # This doesn't given an error

According to ?fit.luz_module_generator it should work just fine with different data types (dataloader, dataset, list); internally luz converts datasets & lists to dataloader, but unless setting options ( fit(xy.ds, dataloader_options = list(...)) ), it uses different defaults than torch::dataloader():
batch_size = 32, shuffle = TRUE, drop_last = TRUE .

In this case it results with an empty dataloader:

luz:::apply_dataloader_options(xy.ds, valid_data = NULL, dataloader_options = NULL)
#> [[1]]
#> <dataloader>
#>   Public:
#> 
#> [[2]]
#> NULL

and training loop exits before first train batch starts, so loss function is not called and there's no "Indexing starts at 1 but found a 0" error.

In Torch for R, fitting a Dataloader object yields "Indexing starts at 1 but found a 0" error

CrossEntropyLoss & zeros as targets

Testing dataloaders

Why there's no error with fit(..., data = <dataset>) ?

Why there's no error with `fit(..., data = <dataset>)` ?