rhistogramrep

Need an R function to replicate X data by Y counts, where X contains some repeated values


I have a fairly large data set (18,000) rows with 2 columns off interest. I would like to treat one (X) as the quantitative values, and the other (Y) as counts, and repeat the X data based on the counts. Due to the nature off the data, there are repeat values in the X column, and I just want to create a new data set containing all X values and its repeated measurements. I have tried doing the following, but it returns an invalid times argument: rep, df$X, df$Y

I am not sure why this error is occurring, and don't know where to go from here. Any help is appreciated. Below is a small sample of my data.

8.76    3
24.69   0
6.24    2
1.17    0
6.54    3
10.29   0
11.04   1
16.71   1

Solution

  • I can reproduce that error when one or more Y is NA (or negative):

    df
    #       X  Y
    # 1  8.76  3
    # 2 24.69 NA
    # 3  6.24  2
    # 4  1.17  0
    # 5  6.54  3
    # 6 10.29  0
    # 7 11.04  1
    # 8 16.71  1
    rep(df$X, df$Y)
    # Error in rep(df$X, df$Y) : invalid 'times' argument
    df$Y[2] <-  -1
    rep(df$X, df$Y)
    # Error in rep(df$X, df$Y) : invalid 'times' argument
    

    We can replace the NA with 0:

    rep(df$X, pmax(0, df$Y, na.rm = TRUE))
    #  [1]  8.76  8.76  8.76  6.24  6.24  6.54  6.54  6.54 11.04 16.71
    

    Data

    df <- structure(list(X = c(8.76, 24.69, 6.24, 1.17, 6.54, 10.29, 11.04, 16.71), Y = c(3L, NA, 2L, 0L, 3L, 0L, 1L, 1L)), row.names = c(NA, -8L), class = "data.frame")