Suppose I am working with some code in R like this:
library(data.table)
dt <- data.table(x=c(1:200),y=rnorm(200))
probs <- c(0.1, 0.25, 0.5, 0.75, 0.9)
quantiles <- quantile(dt$y, prob=probs)
I would like to produce a new variable (an array or a sequence) called labels
that contains formatted strings of the quantiles and their respective values. Let's say quantiles
prints out this:
> quantiles
10% 25% 50% 75% 90%
-1.2097339 -0.6195308 -0.0155171 0.7417443 1.2982685
How would I go about programmatically producing labels
from the value quantiles
such that when I print out labels
it emits an array of sequence like this:
> labels
[1] "10% at -1.20" "25% at -0.61" "50% at -0.01" "75% at 0.74" "90% at 1.29"
So how would you go about wiring all of this together to produce labels
? Given that we have probs
, we could probably simplify this process by doing this zipping with probs
and quantiles
's values.
My goal is to use labels
to label an density function's x axis with package ggplot
where I want to elegantly label both the quantiles and their values together (think about something like this).
I've seen that I can inspect the quantiles programmatically with the builtin function names
:
> names(quantiles)
[1] "10%" "25%" "50%" "75%" "90%"
I've also seen that I can extract the quantile's values programmatically with as.vector
:
> as.vector(quantiles)
[1] -1.2097339 -0.6195308 -0.0155171 0.7417443 1.2982685
But I've seen no way of zipping these two things together à la Python.
Then want I decimal precision on the respective quantile values in the formatting, which requires something akin to using sprintf("%.2f", ...)
on each value.
Each formatted value in the sequence would probably be produced with sprintf("%s at %.2f", q, v)
.
I've used R on and off for two decades, but I've never been able to deeply retain skills in it. The main problem I am facing is with plumbing and ergonomic wiring together of these two pieces of data. Through other research, I found something similar to paste0(names(quantiles), '=', unlist(quantiles), collapse=' at ')
, but this doesn't produce the right result:
> paste0(names(quantiles), '=', unlist(quantiles), collapse=' at ')
[1] "10%=-1.20973393089285 at 25%=-0.619530792386393 at 50%=-0.0155171014275248 at 75%=0.741744347748158 at 90%=1.29826846939529"
It produces a singular string (instead of a sequence) and the precision of the quantile values is too high.
Using sprintf
for everything.
> sprintf('%s at %.2f', names(qntls), qntls)
[1] "10% at -1.30" "25% at -0.61" "50% at -0.02" "75% at 0.63" "90% at 1.29"
For the plot you could do sth like this:
> par(mar=c(4, 4, 1, 1)+.1)
> plot(dens <- density(dt$y), xaxt='n', main='')
> cm <- matrixStats::colMins(dif <- abs(mapply(`-`, list(dens$x), qntls)))
> points(qntls, dens$y[apply(t(t(dif) == cm), 2, which.max)], type='h')
> mtext(sprintf('%s\n(%.2f)', names(qntls), qntls), 1, 1, at=qntls, cex=.8)
Data:
> library(data.table)
> set.seed(42)
> dt <- data.table(x=1:200, y=rnorm(200))
> qntls <- quantile(dt$y, prob=c(0.1, 0.25, 0.5, 0.75, 0.9))