I have a data frame df_1
to which I added a column 'rank'
for each column using the rank
function:
df_1 <- data.frame(A = c(5, 6.5, 5 , 7), B = c(0.1, 0.9, 0.7, 0.01), C = c(3.5, 3.5, 1, 2), D = c(0.1, 0.5, 0.7, 0.9))
df_1$A.rank <- rank(df_1$A, ties.method = "average")
df_1$B.rank <- rank(-df_1$B, ties.method = "average")
df_1$C.rank <- rank(df_1$C, ties.method = "average")
df_1$D.rank <- rank(-df_1$D, ties.method = "average")
print(df_1)
A B C D A.rank B.rank C.rank D.rank
1 5.0 0.10 3.5 0.1 1.5 3 3.5 4
2 6.5 0.90 3.5 0.5 3.0 1 3.5 3
3 5.0 0.70 1.0 0.7 1.5 2 1.0 2
4 7.0 0.01 2.0 0.9 4.0 4 2.0 1
I have another data frame df_2
containing scores associated with each rank:
df_2 <- data.frame(score_rank = c(1, 2, 3, 4), score = c(0.37, 0.25, 0.2, 0.18))
print(df_2)
score_rank score
1 1 0.37
2 2 0.25
3 3 0.20
4 4 0.18
I replaced the values in the 'rank
' columns of df_1
with the values in the 'score
' column of df_2
based on the 'score_rank
' column of df_2
.
new_df_1 <- df_1[, c("A.rank", "B.rank", "C.rank", "D.rank")]
new_df_1[] <- df_2$score[match(unlist(new_df_1), df_2$score_rank)]
print(new_df_1)
A.rank B.rank C.rank D.rank
1 NA 0.20 NA 0.18
2 0.20 0.37 NA 0.20
3 NA 0.25 0.37 0.25
4 0.18 0.18 0.25 0.37
Question: how to automatically replace the NA values in df_1
to achieve this result:
A.rank B.rank C.rank D.rank
1 0.31 0.20 0.19 0.18
2 0.20 0.37 0.19 0.20
3 0.31 0.25 0.37 0.25
4 0.18 0.18 0.25 0.37
NA in row 1 of the column 'A.rank
': mean of values in the 'score
' column in 'df_2
' for the ranks 1 and 2 (i.e., mean(c(0.37, 0.25))) because the value 1.5 calculated by the rank
function is the average of the ranks 1 and 2.
NA in row 3 of the column 'A.rank
': same value as the NA in row 1 of the column 'A.rank
'.
NA in row 1 of the column 'C.rank
': mean of values in the 'score
' column in 'df_2
' for the ranks 3 and 4 (i.e., mean(c(0.2, 0.18))) because the value 3.5 calculated by the 'rank
' function is the average of the ranks 3 and 4.
NA in row 2 of the column 'C.rank
': same value as the NA in row 1 of the column 'C.rank
'
Update : Here is another example :
df_1 <- data.frame(A = c(5, 5, 5 , 5), B = c(0.1, 0.9, 0.7, 0.01), C = c(3.5, 3.5, 1, 2), D = c(0.1, 0.5, 0.7, 0.9))
*## Here is df_1*
A B C D A.rank B.rank C.rank D.rank
1 5 0.10 3.5 0.1 2.5 3 3.5 4
2 5 0.90 3.5 0.5 2.5 1 3.5 3
3 5 0.70 1.0 0.7 2.5 2 1.0 2
4 5 0.01 2.0 0.9 2.5 4 2.0 1
## Here is df_2:
score_rank score
1 1 0.37
2 2 0.25
3 3 0.20
4 4 0.18
## Here is new_df_1
A.rank B.rank C.rank D.rank
1 NA 0.20 NA 0.18
2 NA 0.37 NA 0.20
3 NA 0.25 0.37 0.25
4 NA 0.18 0.25 0.37
Results:
A.rank B.rank C.rank D.rank
1 2.5 0.20 0.19 0.18
2 2.5 0.37 0.19 0.20
3 2.5 0.25 0.37 0.25
4 2.5 0.18 0.25 0.37
1. NA in (row = 1, column = 1): mean(c(df_2[1:4, c("score")])) because 2.5 = mean(c(1, 2, 3, 4)) using the rank function
2. NA in (row = 2, column = 1): like 1)
3. NA in (row = 3, column = 1): like 1)
4. NA in (row = 4, column = 1): like 1)
5. NA in (row = 1, column = 3): mean(c(df_2[3:4, c("score")])) because 3.5 = mean(c(3, 4)) using the rank function
6. NA in (row = 2, column = 3): like 5)
An approach using approx
EDIT: Use method="constant"
in your case
uL <- unique(unlist(df_1[,5:8]))
appx <- approx(df_2$score, n=length(seq(min(uL), max(uL), .5)),
method="constant")
data.frame(sapply(df_1[,5:8], \(x) appx$y[match(x, appx$x)]))
A.rank B.rank C.rank D.rank
1 0.31 0.20 0.19 0.18
2 0.20 0.37 0.19 0.20
3 0.31 0.25 0.37 0.25
4 0.18 0.18 0.25 0.37
New data
df_1 <- structure(list(A = c(5L, 5L, 5L, 5L), B = c(0.1, 0.9, 0.7, 0.01
), C = c(3.5, 3.5, 1, 2), D = c(0.1, 0.5, 0.7, 0.9), A.rank = c(2.5,
2.5, 2.5, 2.5), B.rank = c(3L, 1L, 2L, 4L), C.rank = c(3.5, 3.5,
1, 2), D.rank = 4:1), class = "data.frame", row.names = c("1",
"2", "3", "4"))
uL <- unique(unlist(df_1[,5:8]))
appx <- approx(df_2$score, n=length(seq(min(uL), max(uL), .5)),
method="constant")
data.frame(sapply(df_1[,5:8], \(x) appx$y[match(x, appx$x)]))
A.rank B.rank C.rank D.rank
1 0.25 0.20 0.20 0.18
2 0.25 0.37 0.20 0.20
3 0.25 0.25 0.37 0.25
4 0.25 0.18 0.25 0.37