rdata.tablemelt

How to debug data.table negative length vectors are not allowed error


I am trying to use melt on a data.table,

mdoern_gt_melted <- data.table::melt(modern_gt, id.vars = c('chrom', 'pos', 'ref', 'alt'))
    Error in melt.data.table(gtData, c("chrom", "pos", "ref", "alt")) : negative length vectors are not allowed

my data table is like:

# modern_gt
 chrom      pos ref alt Nea HG01566 NA18593 NA19795 HG01105 HG03225
       1: chr20    10723   T   .  67      66      66      66      66      66
       2: chr20    10724   G   .  67      66      66      66      66      66
       3: chr20    10725   C   .  67      66      66      66      66      66
       4: chr20    10726   C   .  67      66      66      66      66      66
       5: chr20    10727   T   .  67      66      66      66      66      66

I have tried:

(1) use a subset of data

# its ok
data.table::melt(modern_gt[1:10000, ], id.vars = c('chrom', 'pos', 'ref', 'alt'))
# its not ok 
data.table::melt(modern_gt, id.vars = c('chrom', 'pos', 'ref', 'alt'))

I have checked

# https://stackoverflow.com/questions/42479854/merge-error-negative-length-vectors-are-not-allowed
# not duplicated
modern_gt <- unique(modern_gt)

My data (modern_gt.rds) can be obtain from: https://mega.nz/file/SHhnQCYZ#I7dl625XKreIBc3TYn7nYc_L4TTPcsQFZEwnEwD3qu0


Solution

  • You may experience memory issue.

    modern_gt <- readRDS("/home/sapi/Downloads/modern_gt.rds")
    data.table::melt(modern_gt, id.vars = c('chrom', 'pos', 'ref', 'alt'))
    #> Error in melt.data.table(modern_gt, id.vars = c("chrom", "pos", "ref", : negative length vectors are not allowed
    

    However when you sample it (in this case to 2 mln samples), it works:

    modern_gt <- modern_gt |>
      dplyr::slice_sample(n = 2000000)
    
    data.table::melt(modern_gt, id.vars = c('chrom', 'pos', 'ref', 'alt'))
    #>            chrom      pos ref alt variable value
    #>         1: chr20 14204394   T   .      Nea    76
    #>         2: chr20 16182408   G   .      Nea    76
    #>         3: chr20 19657430   A   .      Nea    77
    #>         4: chr20 20949457   A   G      Nea    77
    #>         5: chr20  9800784   A   .      Nea    77
    #>        ---                                      
    #> 201999996: chr20  9188501   C   A  NA19035    76
    #> 201999997: chr20  1547591   T   .  NA19035    66
    #> 201999998: chr20 19909239   T   .  NA19035    66
    #> 201999999: chr20  1396424   A   .  NA19035    66
    #> 202000000: chr20 20094721   C   G  NA19035    16
    

    I would suggest to divide your observation to smaller chunks, transpose it that way and rbind together

    EDIT:

    That's what @Billy34 meanwhile suggested in comment.

    Created on 2023-06-28 with reprex v2.0.2