rrowgrepl

Inserting a new row with value extracted from column name


I have a dataset with column names such as: "b1_BAcid_20_494" "b1_BAcid_20_382" "b1_BAcid_50_100" "b1_BAcid_50_360" "b1_BAcid_50_480" "b1_BAcid_50_750" "b1_monoP_0.2_240" "b1_monoP_0.2_615" "b1_monoP_0.2_527" "b1_monoP_0.2_783".

I am trying to insert a new row that consists of the number between the two underscores for each column name, and then another row with just the compound name. For example, for "b1_BAcid_20_494", a want a row name "concentration" with value "20" and then a row name "compound" with a value "BAcid".

This is what I worked on:

df <- rbind(df, NA) #The dataframe has 500 rows, and I attached a new row with NA as values
rownames(df)[501] <- "concentration" #Named the new row "concentration".


if(grepl("_0.2_", colnames(ddf))){
  concentration <- 0.2
}else if (grepl("_10_", colnames(df))){
  concentration <- 10
}else if (grepl("_100_", colnames(df))){
  concentration <- 100
}else if (grepl("_2_", colnames(df))){
  concentration <- 2
}else if (grepl("_20_", colnames(df))){
  concentration <- 20
}else{
  concentration <- 50
}

These if-else statements didn't work for "concentration", and I wanted to do something similar for the row "compound" also.


Solution

  • Up front, adding a row of the compound name will convert all non-string columns to strings. For instance,

    str(rbind(data.frame(P=pi), data.frame(P="pi")))
    # 'data.frame': 2 obs. of  1 variable:
    #  $ P: chr  "3.14159265358979" "pi"
    

    where we can no longer use 3.14159... as a number (without work). Assuming that you're okay with that ...

    We can use strcapture to extract the compound and concentration, then convert it to rows of a frame.

    vec <- c("b1_BAcid_20_494", "b1_BAcid_20_382", "b1_BAcid_50_100", "b1_BAcid_50_360", "b1_BAcid_50_480", "b1_BAcid_50_750", "b1_monoP_0.2_240", "b1_monoP_0.2_615", "b1_monoP_0.2_527", "b1_monoP_0.2_783")
    tmp <- strcapture(".*_(.*)_(.*)_.*", vec, list(compound="", conc=0))
    tmp
    #    compound conc
    # 1     BAcid 20.0
    # 2     BAcid 20.0
    # 3     BAcid 50.0
    # 4     BAcid 50.0
    # 5     BAcid 50.0
    # 6     BAcid 50.0
    # 7     monoP  0.2
    # 8     monoP  0.2
    # 9     monoP  0.2
    # 10    monoP  0.2
    
    newrows <- setNames(data.frame(t(tmp)), vec)
    newrows
    #          b1_BAcid_20_494 b1_BAcid_20_382 b1_BAcid_50_100 b1_BAcid_50_360 b1_BAcid_50_480 b1_BAcid_50_750 b1_monoP_0.2_240 b1_monoP_0.2_615 b1_monoP_0.2_527 b1_monoP_0.2_783
    # compound           BAcid           BAcid           BAcid           BAcid           BAcid           BAcid            monoP            monoP            monoP            monoP
    # conc                20.0            20.0            50.0            50.0            50.0            50.0              0.2              0.2              0.2              0.2
    

    You can rbind newrows to your existing frame assuming that those are the only columns in the original data. If you have other columns not referenced in our vec, then you have options:

    1. Create the not-yet-included columns in this newrows, in the correct order, with some reasonable value (NA, "", 0, whatever makes sense with y our data).
    2. Use dplyr::bind_rows(orig, newrows) or data.table::rbindlist(list(orig, newrows), fill=TRUE, use.names=TRUE), as both will work around the not-included names in your orig frame.