rhmisc

Can I split describe() on a factor variable?


I'd like to describe a response variable according to all the values in a factor variable.

I want to run something like this code

library("Hmisc")
describe(mtcars$hp)

Except that I want to get a different output by each value of cyl


Solution

  • a tidy / purrr solution

    library(Hmisc)
    library(purrr)
    mtcars %>%
      split(.$cyl) %>%
      purrr::map(~ describe(.x$hp))
    #> $`4`
    #> .x$hp 
    #>        n  missing distinct     Info     Mean      Gmd      .05      .10 
    #>       11        0       10    0.995    82.64    24.51     57.0     62.0 
    #>      .25      .50      .75      .90      .95 
    #>     65.5     91.0     96.0    109.0    111.0 
    #> 
    #> lowest :  52  62  65  66  91, highest:  93  95  97 109 113
    #>                                                                       
    #> Value         52    62    65    66    91    93    95    97   109   113
    #> Frequency      1     1     1     2     1     1     1     1     1     1
    #> Proportion 0.091 0.091 0.091 0.182 0.091 0.091 0.091 0.091 0.091 0.091
    #> 
    #> $`6`
    #> .x$hp 
    #>        n  missing distinct     Info     Mean      Gmd 
    #>        7        0        4    0.911    122.3    23.71 
    #>                                   
    #> Value        105   110   123   175
    #> Frequency      1     3     2     1
    #> Proportion 0.143 0.429 0.286 0.143
    #> 
    #> $`8`
    #> .x$hp 
    #>        n  missing distinct     Info     Mean      Gmd 
    #>       14        0        9    0.985    209.2    56.69 
    #> 
    #> lowest : 150 175 180 205 215, highest: 215 230 245 264 335
    #>                                                                 
    #> Value        150   175   180   205   215   230   245   264   335
    #> Frequency      2     2     3     1     1     1     2     1     1
    #> Proportion 0.143 0.143 0.214 0.071 0.071 0.071 0.143 0.071 0.071