I have a dataset with column names such as: "b1_BAcid_20_494" "b1_BAcid_20_382" "b1_BAcid_50_100" "b1_BAcid_50_360" "b1_BAcid_50_480" "b1_BAcid_50_750" "b1_monoP_0.2_240" "b1_monoP_0.2_615" "b1_monoP_0.2_527" "b1_monoP_0.2_783".
I am trying to insert a new row that consists of the number between the two underscores for each column name, and then another row with just the compound name. For example, for "b1_BAcid_20_494", a want a row name "concentration" with value "20" and then a row name "compound" with a value "BAcid".
This is what I worked on:
df <- rbind(df, NA) #The dataframe has 500 rows, and I attached a new row with NA as values
rownames(df)[501] <- "concentration" #Named the new row "concentration".
if(grepl("_0.2_", colnames(ddf))){
concentration <- 0.2
}else if (grepl("_10_", colnames(df))){
concentration <- 10
}else if (grepl("_100_", colnames(df))){
concentration <- 100
}else if (grepl("_2_", colnames(df))){
concentration <- 2
}else if (grepl("_20_", colnames(df))){
concentration <- 20
}else{
concentration <- 50
}
These if-else statements didn't work for "concentration", and I wanted to do something similar for the row "compound" also.
Up front, adding a row of the compound name will convert all non-string columns to strings. For instance,
str(rbind(data.frame(P=pi), data.frame(P="pi")))
# 'data.frame': 2 obs. of 1 variable:
# $ P: chr "3.14159265358979" "pi"
where we can no longer use 3.14159...
as a number (without work). Assuming that you're okay with that ...
We can use strcapture
to extract the compound and concentration, then convert it to rows of a frame.
vec <- c("b1_BAcid_20_494", "b1_BAcid_20_382", "b1_BAcid_50_100", "b1_BAcid_50_360", "b1_BAcid_50_480", "b1_BAcid_50_750", "b1_monoP_0.2_240", "b1_monoP_0.2_615", "b1_monoP_0.2_527", "b1_monoP_0.2_783")
tmp <- strcapture(".*_(.*)_(.*)_.*", vec, list(compound="", conc=0))
tmp
# compound conc
# 1 BAcid 20.0
# 2 BAcid 20.0
# 3 BAcid 50.0
# 4 BAcid 50.0
# 5 BAcid 50.0
# 6 BAcid 50.0
# 7 monoP 0.2
# 8 monoP 0.2
# 9 monoP 0.2
# 10 monoP 0.2
newrows <- setNames(data.frame(t(tmp)), vec)
newrows
# b1_BAcid_20_494 b1_BAcid_20_382 b1_BAcid_50_100 b1_BAcid_50_360 b1_BAcid_50_480 b1_BAcid_50_750 b1_monoP_0.2_240 b1_monoP_0.2_615 b1_monoP_0.2_527 b1_monoP_0.2_783
# compound BAcid BAcid BAcid BAcid BAcid BAcid monoP monoP monoP monoP
# conc 20.0 20.0 50.0 50.0 50.0 50.0 0.2 0.2 0.2 0.2
You can rbind
newrows to your existing frame assuming that those are the only columns in the original data. If you have other columns not referenced in our vec
, then you have options:
newrows
, in the correct order, with some reasonable value (NA
, ""
, 0
, whatever makes sense with y our data).dplyr::bind_rows(orig, newrows)
or data.table::rbindlist(list(orig, newrows), fill=TRUE, use.names=TRUE)
, as both will work around the not-included names in your orig
frame.