Suppose we have a simple data frame:
structure(c(2, 4, 5, 6, 8, 1, 2, 4, 6, 67, 8, 11), dim = c(6L,
2L), dimnames = list(NULL, c("lo", "li")))
How can I find the percentile for each observation for both variables?
The most R friendly approach would be to (i) convert this to a dataframe (or tibble), (ii) reshape the data into long format, (iii) groupby lo and li, and (iv) calculate the percent rank.
Here's the code:
df%>%
as_tibble() %>% # convert to dataframe
gather(key=variable,value=value) %>% # gather into long form
group_by(variable)%>%. # group by lo and li
mutate(percentile=percent_rank(val)*100) # make new column
variable val percentile
<chr> <dbl> <dbl>
1 lo 2 20
2 lo 4 40
3 lo 5 60
4 lo 6 80
5 lo 8 100
6 lo 1 0
7 li 2 0
8 li 4 20
9 li 6 40
10 li 67 100
11 li 8 60
12 li 11 80
If you don't want to make the dataframe long, just do the two columns seperately:
df%>%
as_tibble()%>%
mutate(lo_pr=percent_rank(lo)*100)%>%
mutate(li_percentile=percent_rank(li)*100)
lo li lo_pr li_percentile
<dbl> <dbl> <dbl> <dbl>
1 2 2 20 0
2 4 4 40 20
3 5 6 60 40
4 6 67 80 100
5 8 8 100 60
6 1 11 0 80