rdataframe

Move subgroup under repeated main group while keeping main group once in data.frame


I'm aware that the question is awkward. If I could phrase it better I'd probably find the solution in an other thread.

I have this data structure:

df <- data.frame(group = c("X", "F", "F", "F", "F", "C", "C"),
                 subgroup = c(NA, "camel", "horse", "dog", "cat", "orange", "banana"))

and would like to turn it into this:

data.frame(group = c("X", "F", "camel", "horse", "dog", "cat", "C", "orange", "banana"))

which is surprisingly confusing. Also, I would prefer not using a loop.

I updated the example to clarify that solutions that depend on sorting unfortunately do not do the trick.


Solution

  • Here an (edited) answer with new data. Using data.table is going to help a lot. The idea is to split the df into groups and lapply() to each group what we need. Whe have to take care of some things meanwhile.

    library(data.table)
    # set as data.table
    setDT(df)
    
    # to mantain the ordering, you need to put as factor the group.
    # the levels are going to give the ordering infos to split
    df[,':='(group = factor(group, levels =unique(df$group)))]
    
    # here the split function, splitting df int a list
    df_list <-split(df, df$group, sorted =F)
    
    # now you lapply to each element what you need
    df_list <-lapply(df_list, function(x) data.frame(group = unique(c(as.character(x$group),x$subgroup))))
    
    # put into a data.table and remove NAs
    rbindlist(df_list)[!is.na(df_onecol$group)]
    
        group
    1:      X
    2:      F
    3:  camel
    4:  horse
    5:    dog
    6:    cat
    7:      C
    8: orange
    9: banana