I would like to subtract the mean of each of the "blank" columns from each of the values on the preceding 3 columns in a dataset that looks something like this:
df <- data.frame(da=1:5, d2=6:10, dd=2:6,
blank...1=c(0.1, 0.1, 0.4, 0.2, 0.1), d5=2:6, dg=7:11,
di=3:7, blank...2=c(0.2, 0.2, 0.4, 0.1, 0.1), dm=21:25,
h4=5:9, d7=26:30, blank...3=c(0.1, 0.3, 0.4, 0.4, 0.1))
df
# da d2 dd blank...1 d5 dg di blank...2 dm h4 d7 blank...3
# 1 1 6 2 0.1 2 7 3 0.2 21 5 26 0.1
# 2 2 7 3 0.1 3 8 4 0.2 22 6 27 0.3
# 3 3 8 4 0.4 4 9 5 0.4 23 7 28 0.4
# 4 4 9 5 0.2 5 10 6 0.1 24 8 29 0.4
# 5 5 10 6 0.1 6 11 7 0.1 25 9 30 0.1
In other words, I want to subtract the mean of blank...1 to each value of columns da, d2, and dd; then subtract the mean of blank...2 to each value of columns d5, dg, and di and so on. The actual dataset actually has 15 blanks that need to be subtracted from the prior 11 columns.
I have generated the means of all the "blank" columns in a vector using colMeans
but then I do not know how to write the function to indicate which value to use for which columns.
How can this be accomplished through a function?
Here's another base R approach inspired by jpsmith:
blanks = grep("blank", names(df)) # find the blank columns
blank_means = colMeans(df[blanks]) # get their means
## subtract blanks means from the non-blank columns
## this relies on having exactly 3 non-blank columns per blank column
df[-blanks] = Map("-", df[-blanks], rep(blank_means, each = 3))
df
# da d2 dd blank...1 d5 dg di blank...2 dm h4 d7 blank...3
# 1 0.82 5.82 1.82 0.1 1.8 6.8 2.8 0.2 20.74 4.74 25.74 0.1
# 2 1.82 6.82 2.82 0.1 2.8 7.8 3.8 0.2 21.74 5.74 26.74 0.3
# 3 2.82 7.82 3.82 0.4 3.8 8.8 4.8 0.4 22.74 6.74 27.74 0.4
# 4 3.82 8.82 4.82 0.2 4.8 9.8 5.8 0.1 23.74 7.74 28.74 0.4
# 5 4.82 9.82 5.82 0.1 5.8 10.8 6.8 0.1 24.74 8.74 29.74 0.1