rtidyversepurrrpsych

How to combine a list of outputs from `psych` into a single dataframe?


Important packages

library(psych)
library(dplyr)

I'm using the iris dataset built-in in R.

I can use the following syntax from psych package to easily get descriptive statistics by group, in this case, by Species.

Input

describe(iris ~ Species)

Which gives me the following output

 Descriptive statistics by group 
group: setosa
             vars  n mean   sd median trimmed  mad min max range skew kurtosis   se
Sepal.Length    1 50 5.01 0.35    5.0    5.00 0.30 4.3 5.8   1.5 0.11    -0.45 0.05
Sepal.Width     2 50 3.43 0.38    3.4    3.42 0.37 2.3 4.4   2.1 0.04     0.60 0.05
Petal.Length    3 50 1.46 0.17    1.5    1.46 0.15 1.0 1.9   0.9 0.10     0.65 0.02
Petal.Width     4 50 0.25 0.11    0.2    0.24 0.00 0.1 0.6   0.5 1.18     1.26 0.01
Species*        5 50 1.00 0.00    1.0    1.00 0.00 1.0 1.0   0.0  NaN      NaN 0.00
-------------------------------------------------------------------------------------------------------------------------- 
group: versicolor
             vars  n mean   sd median trimmed  mad min max range  skew kurtosis   se
Sepal.Length    1 50 5.94 0.52   5.90    5.94 0.52 4.9 7.0   2.1  0.10    -0.69 0.07
Sepal.Width     2 50 2.77 0.31   2.80    2.78 0.30 2.0 3.4   1.4 -0.34    -0.55 0.04
Petal.Length    3 50 4.26 0.47   4.35    4.29 0.52 3.0 5.1   2.1 -0.57    -0.19 0.07
Petal.Width     4 50 1.33 0.20   1.30    1.32 0.22 1.0 1.8   0.8 -0.03    -0.59 0.03
Species*        5 50 2.00 0.00   2.00    2.00 0.00 2.0 2.0   0.0   NaN      NaN 0.00
-------------------------------------------------------------------------------------------------------------------------- 
group: virginica
             vars  n mean   sd median trimmed  mad min max range  skew kurtosis   se
Sepal.Length    1 50 6.59 0.64   6.50    6.57 0.59 4.9 7.9   3.0  0.11    -0.20 0.09
Sepal.Width     2 50 2.97 0.32   3.00    2.96 0.30 2.2 3.8   1.6  0.34     0.38 0.05
Petal.Length    3 50 5.55 0.55   5.55    5.51 0.67 4.5 6.9   2.4  0.52    -0.37 0.08
Petal.Width     4 50 2.03 0.27   2.00    2.03 0.30 1.4 2.5   1.1 -0.12    -0.75 0.04
Species*        5 50 3.00 0.00   3.00    3.00 0.00 3.0 3.0   0.0   NaN      NaN 0.00

Problem

Now, I want to merge (join, or combine) these three outputs into a single dataframe (preferably a tibble). I need to do it in an efficient and concise way.

I know how to that manually. See below

m <- describe(iris ~ Species)

a <- m$setosa %>% 
  as_tibble(rownames = "var") %>% 
  mutate(group = "setosa")

b <- m$versicolor %>% 
  as_tibble(rownames = "var") %>% 
  mutate(group = "versicolor")

c <- m$virginica %>% 
  as_tibble(rownames = "var") %>% 
  mutate(group = "virginica")

full_join(a,b) %>% 
  full_join(c) %>% 
  filter(var != "Species*") %>% 
  select(group, everything())

Expected Output

    # A tibble: 12 x 15
   group      var           vars     n  mean    sd median trimmed   mad   min   max range    skew kurtosis     se
   <chr>      <chr>        <int> <dbl> <dbl> <dbl>  <dbl>   <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl>    <dbl>  <dbl>
 1 setosa     Sepal.Length     1    50 5.01  0.352   5      5.00  0.297   4.3   5.8   1.5  0.113    -0.451 0.0498
 2 setosa     Sepal.Width      2    50 3.43  0.379   3.4    3.42  0.371   2.3   4.4   2.1  0.0387    0.596 0.0536
 3 setosa     Petal.Length     3    50 1.46  0.174   1.5    1.46  0.148   1     1.9   0.9  0.100     0.654 0.0246
 4 setosa     Petal.Width      4    50 0.246 0.105   0.2    0.238 0       0.1   0.6   0.5  1.18      1.26  0.0149
 5 versicolor Sepal.Length     1    50 5.94  0.516   5.9    5.94  0.519   4.9   7     2.1  0.0991   -0.694 0.0730
 6 versicolor Sepal.Width      2    50 2.77  0.314   2.8    2.78  0.297   2     3.4   1.4 -0.341    -0.549 0.0444
 7 versicolor Petal.Length     3    50 4.26  0.470   4.35   4.29  0.519   3     5.1   2.1 -0.571    -0.190 0.0665
 8 versicolor Petal.Width      4    50 1.33  0.198   1.3    1.32  0.222   1     1.8   0.8 -0.0293   -0.587 0.0280
 9 virginica  Sepal.Length     1    50 6.59  0.636   6.5    6.57  0.593   4.9   7.9   3    0.111    -0.203 0.0899
10 virginica  Sepal.Width      2    50 2.97  0.322   3      2.96  0.297   2.2   3.8   1.6  0.344     0.380 0.0456
11 virginica  Petal.Length     3    50 5.55  0.552   5.55   5.51  0.667   4.5   6.9   2.4  0.517    -0.365 0.0780
12 virginica  Petal.Width      4    50 2.03  0.275   2      2.03  0.297   1.4   2.5   1.1 -0.122    -0.754 0.0388

I have the impression I can do that with some function from purrr. But I'm having a hard time putting it together.


Solution

  • Convert to tibble/data.frame on each of the list element with map and bind them into a single data (_dfr)

    library(tibble)
    library(purrr)
    library(dplyr)
    map_dfr(m, ~ .x %>% 
            as_tibble(rownames = "var") %>% 
            slice(-n()), .id = "group")
    

    -output

    # A tibble: 12 × 15
       group      var           vars     n  mean    sd median trimmed   mad   min   max range    skew kurtosis     se
       <chr>      <chr>        <int> <dbl> <dbl> <dbl>  <dbl>   <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl>    <dbl>  <dbl>
     1 setosa     Sepal.Length     1    50 5.01  0.352   5      5.00  0.297   4.3   5.8   1.5  0.113    -0.451 0.0498
     2 setosa     Sepal.Width      2    50 3.43  0.379   3.4    3.42  0.371   2.3   4.4   2.1  0.0387    0.596 0.0536
     3 setosa     Petal.Length     3    50 1.46  0.174   1.5    1.46  0.148   1     1.9   0.9  0.100     0.654 0.0246
     4 setosa     Petal.Width      4    50 0.246 0.105   0.2    0.238 0       0.1   0.6   0.5  1.18      1.26  0.0149
     5 versicolor Sepal.Length     1    50 5.94  0.516   5.9    5.94  0.519   4.9   7     2.1  0.0991   -0.694 0.0730
     6 versicolor Sepal.Width      2    50 2.77  0.314   2.8    2.78  0.297   2     3.4   1.4 -0.341    -0.549 0.0444
     7 versicolor Petal.Length     3    50 4.26  0.470   4.35   4.29  0.519   3     5.1   2.1 -0.571    -0.190 0.0665
     8 versicolor Petal.Width      4    50 1.33  0.198   1.3    1.32  0.222   1     1.8   0.8 -0.0293   -0.587 0.0280
     9 virginica  Sepal.Length     1    50 6.59  0.636   6.5    6.57  0.593   4.9   7.9   3    0.111    -0.203 0.0899
    10 virginica  Sepal.Width      2    50 2.97  0.322   3      2.96  0.297   2.2   3.8   1.6  0.344     0.380 0.0456
    11 virginica  Petal.Length     3    50 5.55  0.552   5.55   5.51  0.667   4.5   6.9   2.4  0.517    -0.365 0.0780
    12 virginica  Petal.Width      4    50 2.03  0.275   2      2.03  0.297   1.4   2.5   1.1 -0.122    -0.754 0.0388
    

    By converting to data.frame/tibble, it removes the extra class - psych/describe

     map_dfr(m, ~ {class(.x) <- c("tbl_df", "data.frame")
                .x}, 
        .id = "var")