I have a dataset that looks like this when pivoted long
data<-data.frame(
ID=c("type1","type1","type1","type1","type1","type1","type2","type2","type2","type2","type2","type2"),
tag=c(1,2,3,4,5,6,1,2,3,4,5,6),
value=c(0.2,0.7,0.45,0.4,1,0,0.056,0.12,0.3,0.67,0.09,0.004)
)
Is there a way after grouping by the id to organize the values by size taking the top 3 most and then throw a flag if specific tags are within a threshold of each other and saved.
For instance: if the threshold was set to 0.05 type1 would be [1,0.7,0.45,0.4,0.2,0]
and then a flag would be raised that tags 3 and 4 were within the threshold boundary of each other.
The goal is to be able to add a code to each type made up of the top most tags with a note if there are any within the threshold.
So the type one label would be 523 (with a 4 within the threshold) and type2 would be 631 but no flag is thrown as there is not a value within 0.05 of the top 3 values
id | label | flag |
---|---|---|
type1 | 523 | 4 within threshold |
type2 | 631 | NA |
I am currently able to order it but the internal threshold is what is giving me issues with how to approach the problem.
threshold = 0.05
data |>
arrange(ID, -value) |>
summarize(label = paste0(tag[row_number() <= 3], collapse = ""),
within_threshold = paste0(
tag[row_number() > 3 &
value[row_number() == 3] - value <= threshold],
collapse = ""),
.by = ID)
In type1, the #4 value 0.40 [label 4] is within 0.05 of 0.40, the #3 value. In type2, the #4 value 0.09 [label 5] is within 0.05 of 0.12, the #3 value.
ID label within_threshold
1 type1 523 4
2 type2 432 5
Or