rdataframerep

Problem using rep() in R. Invalid "times" argument


I looked for a solution in the forum but I didn´t get any.

I´m working with a fish database and I´m trying to transform my data frame from this (MRE):

 df_initial <- structure(list(year = c(2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 
2011L), haul = c(11L, 11L, 11L, 11L, 11L, 11L, 11L), species = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 1L), .Label = "Merluccius merluccius", class = "factor"), 
    length = c(29L, 33L, 34L, 37L, 10L, 11L, 12L), number = c(2L, 
    1L, 1L, 1L, 7L, 4L, 5L)), class = "data.frame", row.names = c(NA, 
-7L))

to this

  df_final <-structure(list(year = c(2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 
2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 
2011L, 2011L, 2011L, 2011L, 2011L, 2011L), haul = c(11L, 11L, 
11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 
11L, 11L, 11L, 11L, 11L, 11L), species = structure(c(1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L), .Label = "Merluccius merluccius", class = "factor"), 
    length = c(29L, 29L, 33L, 34L, 37L, 10L, 10L, 10L, 10L, 10L, 
    10L, 10L, 11L, 11L, 11L, 11L, 12L, 12L, 12L, 12L, 12L), number = c(2L, 
    2L, 1L, 1L, 1L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 4L, 4L, 4L, 4L, 
    5L, 5L, 5L, 5L, 5L)), class = "data.frame", row.names = c(NA, 
-21L))

Namely, I want to replicate the length size by its number and keeping all the columns.

I´ve tried several approaches using the function rep() but I always get the same error: invalid 'times' argument . I´ve also tried playing with the data type but with no success.

What am I doing wrong?.

Here it is the last code I ran

df_final <- df_initial[rep(row.names(df_initial), df_initial$number), 1:5] 

Any help will be more than welcome. Thanks in advance.


Solution

  • The error is most likely caused by NA values in number. You'll have to deal with these first, either by dropping them or, if you want to retain them in the output, replacing NA with some value. Here's how to do both, using either base R or {tidyr}.

    Remove rows with NAs

    base R:

    # add NA values to example
    df_initial$number[5:6] <- NA_integer_
    
    df_cleaned <- df_initial[!is.na(df_initial$number), ]
    df_final <- df_cleaned[rep(row.names(df_cleaned), df_cleaned$number), 1:5]
    
    df_final
    
    #>     year haul               species length number
    #> 1   2011   11 Merluccius merluccius     29      2
    #> 1.1 2011   11 Merluccius merluccius     29      2
    #> 2   2011   11 Merluccius merluccius     33      1
    #> 3   2011   11 Merluccius merluccius     34      1
    #> 4   2011   11 Merluccius merluccius     37      1
    #> 7   2011   11 Merluccius merluccius     12      5
    #> 7.1 2011   11 Merluccius merluccius     12      5
    #> 7.2 2011   11 Merluccius merluccius     12      5
    #> 7.3 2011   11 Merluccius merluccius     12      5
    #> 7.4 2011   11 Merluccius merluccius     12      5
    

    tidyr:

    library(tidyr)
    
    df_final <- df_initial %>% 
      drop_na(number) %>% 
      uncount(weights = number, .remove = FALSE)
    
    df_final
    
    #>    year haul               species length number
    #> 1  2011   11 Merluccius merluccius     29      2
    #> 2  2011   11 Merluccius merluccius     29      2
    #> 3  2011   11 Merluccius merluccius     33      1
    #> 4  2011   11 Merluccius merluccius     34      1
    #> 5  2011   11 Merluccius merluccius     37      1
    #> 6  2011   11 Merluccius merluccius     12      5
    #> 7  2011   11 Merluccius merluccius     12      5
    #> 8  2011   11 Merluccius merluccius     12      5
    #> 9  2011   11 Merluccius merluccius     12      5
    #> 10 2011   11 Merluccius merluccius     12      5
    

    Replace NAs

    base R:

    df_cleaned <- df_initial
    df_cleaned$number[is.na(df_initial$number)] <- 1L
    df_final <- df_cleaned[rep(row.names(df_cleaned), df_cleaned$number), 1:5]
    
    df_final
    
    #>     year haul               species length number
    #> 1   2011   11 Merluccius merluccius     29      2
    #> 1.1 2011   11 Merluccius merluccius     29      2
    #> 2   2011   11 Merluccius merluccius     33      1
    #> 3   2011   11 Merluccius merluccius     34      1
    #> 4   2011   11 Merluccius merluccius     37      1
    #> 5   2011   11 Merluccius merluccius     10      1
    #> 6   2011   11 Merluccius merluccius     11      1
    #> 7   2011   11 Merluccius merluccius     12      5
    #> 7.1 2011   11 Merluccius merluccius     12      5
    #> 7.2 2011   11 Merluccius merluccius     12      5
    #> 7.3 2011   11 Merluccius merluccius     12      5
    #> 7.4 2011   11 Merluccius merluccius     12      5
    

    tidyr

    df_final <- df_initial %>% 
      replace_na(list(number = 1L)) %>% 
      uncount(weights = number, .remove = FALSE)
    df_final
    
    #>    year haul               species length number
    #> 1  2011   11 Merluccius merluccius     29      2
    #> 2  2011   11 Merluccius merluccius     29      2
    #> 3  2011   11 Merluccius merluccius     33      1
    #> 4  2011   11 Merluccius merluccius     34      1
    #> 5  2011   11 Merluccius merluccius     37      1
    #> 6  2011   11 Merluccius merluccius     10      1
    #> 7  2011   11 Merluccius merluccius     11      1
    #> 8  2011   11 Merluccius merluccius     12      5
    #> 9  2011   11 Merluccius merluccius     12      5
    #> 10 2011   11 Merluccius merluccius     12      5
    #> 11 2011   11 Merluccius merluccius     12      5
    #> 12 2011   11 Merluccius merluccius     12      5
    

    Created on 2022-03-15 by the reprex package (v2.0.1)