rdataframewrite.table

R keep getting error Error unimplemented type 'list' in 'EncodeElement'


Here is an example of the df I have:

df <- data.frame(
  id = c(1, 2, 3, 4, 5),
  age = c(30, 45, 50, 60, 35),
  sis_num = c(0,1, 2, 3, 4),
  sis1_bca_status = c(0, 1, 0, 1, 0),
  sis2_bca_status = c(0, 0, 1, 0, 1),
  sis1_bca_age = c(0, 40, 50, 60, 70),
  sis2_bca_age = c(0, 45, 55, 65, 75),
  menopause_status = c(0, 1, 2, 0, 1),
  meno_age = c(55, 60, 50, 55, 52),
  hrt_status = c(0, 1, 2, 1, 0)
)

I need to parse through each line of this df and produce a new df. It will copy all the values for each id from this df, then based on the value sis_num (number of sisters), I will need to add seven values to the output for each id. If sis_num=0, 0 values for sisters, if sis_num=1, 7 extra values, is sis_num=2, then 14 extra values and so on so forth. So in the output, each row will a different number of columns (I need this output in a txt format to insert into another program to calculate something else, that's the only input it will take). To do this this is what I have done so far:

# Function to modify each row based on sis_num 
create_row <- function(row) {   
# Extract values from the row using the correct indexing   
id <- row[1]   
age <- row[2]   
sis_num <- row[3]   
sis1_bca_status <- row[4]   
sis2_bca_status <- row[5]   
sis1_bca_age <- row[6]   
sis2_bca_age <- row[7]   
menopause_status <- row[8]   
meno_age <- row[9]   
hrt_status <- row[10]
     # Initialize the new row with id, age, sis_num   
new_row <- c(id, age, sis_num)
     # Add dynamic values for the first two sisters   
if (sis_num == 0) {
    # If sis_num is 0, return the row as it is
    new_row<-c(new_row)   }
else if (sis_num == 1) {
    new_row <- c(new_row, sis1_bca_status, 0, 0, sis1_bca_age, -99, -99, 0)   } 
else if (sis_num == 2) {
    new_row <- c(new_row, sis1_bca_status, 0, 0, sis1_bca_age, -99, -99, 0,
                 sis2_bca_status, 0, 0, sis2_bca_age, -99, -99, 0)   } 
else if (sis_num > 2) {
    new_row <- c(new_row, sis1_bca_status, 0, 0, sis1_bca_age, -99, -99, 0,
                 sis2_bca_status, 0, 0, sis2_bca_age, -99, -99, 0)
    # Add static values for remaining sisters (sis3, sis4, etc.)
    for (i in 3:sis_num) {
      new_row <- c(new_row, 0, 0, 0, -99, -99,-99, 0)
    }   }
# Add the remaining columns (menopause_status, meno_age, hrt_status)   
new_row <- c(new_row, menopause_status, meno_age, hrt_status)
         return(new_row) }

# Apply the function to each row of the dataframe 
output_df <- t(apply(df, 1, create_row))
output_df2 <- do.call(rbind, output_df)
# Convert the output to a dataframe
output_df2 <- as.data.frame(output_df2)
print(output_df2)

The output I get for output_df2 is:

output_df2 <- as.data.frame(output_df2) print(output_df2)

                                                                                                                             V1
V1                                                                                                           1, 30, 0, 1, 55, 0
V2                                                                                 2, 45, 1, 1, 0, 0, 40, -99, -99, 0, 1, 60, 1
V3                                                       3, 50, 2, 0, 0, 0, 50, -99, -99, 0, 1, 0, 0, 55, -99, -99, 0, 2, 50, 2
V4                            4, 60, 3, 1, 0, 0, 60, -99, -99, 0, 0, 0, 0, 65, -99, -99, 0, 0, 0, 0, -99, -99, -99, 0, 0, 55, 1
V5 5, 35, 4, 0, 0, 0, 70, -99, -99, 0, 1, 0, 0, 75, -99, -99, 0, 0, 0, 0, -99, -99, -99, 0, 0, 0, 0, -99, -99, -99, 0, 1, 52, 0
> write.table(output_df2, file = "output_df.txt", sep = "\t", row.names = FALSE, col.names = FALSE, quote = FALSE)
Error in write.table(output_df2, file = "output_df.txt", sep = "\t", row.names = FALSE,  : 
  unimplemented type 'list' in 'EncodeElement'
                

I want this output in a txt file without the V1, V2, v3. Each ID will have a new row with its values and the number of columns for each row(or ID) is dynamic depending on sis_num. To fix the list issues it, I tried the previous solutions shared here:

> output_df2 <- apply(output_df,2,as.character) 
write.csv(output_df2,file = "output_df.csv")

The output looks like this in csv, but this isn't what I want: enter image description here How do I fix this?

In real life, I have 80 variables in the input file. Is there anyway I can copy them into the new dataframe faster rather than indexing each value


Solution

  • I do not really understand your function. Here is collected what I understand:

    f = \(r) {
      i = r[c('id', 'age', 'sis_num')] 
      j = r[c('menopause_status', 'meno_age', 'hrt_status')]
      if (r['sis_num'] == 0) {
        c(i, j)
      } else if (r['sis_num'] == 1) {
        c(i, r['sis1_bca_status'], 0, 0, r['sis1_bca_age'], -99, -99, 0, j) 
      } else { 
        c(i, r['sis1_bca_status'], 0, 0, r['sis1_bca_age'], -99, -99, 0,
          r['sis2_bca_status'], 0, 0, r['sis2_bca_age'], -99, 0, j) 
      } 
    }
    

    Please add back in what is missing. Applying f row-wisely:

    l = apply(df, 1, \(x) toString(f(x)))
    

    l is a list which looks like

    > l
    [1] "1, 30, 0, 0, 55, 0"                                               
    [2] "2, 45, 1, 1, 0, 0, 40, -99, -99, 0, 1, 60, 1"                     
    [3] "3, 50, 2, 0, 0, 0, 50, -99, -99, 0, 1, 0, 0, 55, -99, 0, 2, 50, 2"
    [4] "4, 60, 3, 1, 0, 0, 60, -99, -99, 0, 0, 0, 0, 65, -99, 0, 0, 55, 1"
    [5] "5, 35, 4, 0, 0, 0, 70, -99, -99, 0, 1, 0, 0, 75, -99, 0, 1, 52, 0"
    

    We can use cat() to write l as csv file called test.csv to disk.

    cat(l, sep='\n', file='test.csv')
    

    It has been stored in the working directory. Type getwd() to see where.