Important packages
library(psych)
library(dplyr)
I'm using the iris
dataset built-in in R.
I can use the following syntax from psych
package to easily get descriptive statistics by group, in this case, by Species
.
describe(iris ~ Species)
Which gives me the following output
Descriptive statistics by group
group: setosa
vars n mean sd median trimmed mad min max range skew kurtosis se
Sepal.Length 1 50 5.01 0.35 5.0 5.00 0.30 4.3 5.8 1.5 0.11 -0.45 0.05
Sepal.Width 2 50 3.43 0.38 3.4 3.42 0.37 2.3 4.4 2.1 0.04 0.60 0.05
Petal.Length 3 50 1.46 0.17 1.5 1.46 0.15 1.0 1.9 0.9 0.10 0.65 0.02
Petal.Width 4 50 0.25 0.11 0.2 0.24 0.00 0.1 0.6 0.5 1.18 1.26 0.01
Species* 5 50 1.00 0.00 1.0 1.00 0.00 1.0 1.0 0.0 NaN NaN 0.00
--------------------------------------------------------------------------------------------------------------------------
group: versicolor
vars n mean sd median trimmed mad min max range skew kurtosis se
Sepal.Length 1 50 5.94 0.52 5.90 5.94 0.52 4.9 7.0 2.1 0.10 -0.69 0.07
Sepal.Width 2 50 2.77 0.31 2.80 2.78 0.30 2.0 3.4 1.4 -0.34 -0.55 0.04
Petal.Length 3 50 4.26 0.47 4.35 4.29 0.52 3.0 5.1 2.1 -0.57 -0.19 0.07
Petal.Width 4 50 1.33 0.20 1.30 1.32 0.22 1.0 1.8 0.8 -0.03 -0.59 0.03
Species* 5 50 2.00 0.00 2.00 2.00 0.00 2.0 2.0 0.0 NaN NaN 0.00
--------------------------------------------------------------------------------------------------------------------------
group: virginica
vars n mean sd median trimmed mad min max range skew kurtosis se
Sepal.Length 1 50 6.59 0.64 6.50 6.57 0.59 4.9 7.9 3.0 0.11 -0.20 0.09
Sepal.Width 2 50 2.97 0.32 3.00 2.96 0.30 2.2 3.8 1.6 0.34 0.38 0.05
Petal.Length 3 50 5.55 0.55 5.55 5.51 0.67 4.5 6.9 2.4 0.52 -0.37 0.08
Petal.Width 4 50 2.03 0.27 2.00 2.03 0.30 1.4 2.5 1.1 -0.12 -0.75 0.04
Species* 5 50 3.00 0.00 3.00 3.00 0.00 3.0 3.0 0.0 NaN NaN 0.00
Now, I want to merge (join, or combine) these three outputs into a single dataframe (preferably a tibble
). I need to do it in an efficient and concise way.
I know how to that manually. See below
m <- describe(iris ~ Species)
a <- m$setosa %>%
as_tibble(rownames = "var") %>%
mutate(group = "setosa")
b <- m$versicolor %>%
as_tibble(rownames = "var") %>%
mutate(group = "versicolor")
c <- m$virginica %>%
as_tibble(rownames = "var") %>%
mutate(group = "virginica")
full_join(a,b) %>%
full_join(c) %>%
filter(var != "Species*") %>%
select(group, everything())
# A tibble: 12 x 15
group var vars n mean sd median trimmed mad min max range skew kurtosis se
<chr> <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 setosa Sepal.Length 1 50 5.01 0.352 5 5.00 0.297 4.3 5.8 1.5 0.113 -0.451 0.0498
2 setosa Sepal.Width 2 50 3.43 0.379 3.4 3.42 0.371 2.3 4.4 2.1 0.0387 0.596 0.0536
3 setosa Petal.Length 3 50 1.46 0.174 1.5 1.46 0.148 1 1.9 0.9 0.100 0.654 0.0246
4 setosa Petal.Width 4 50 0.246 0.105 0.2 0.238 0 0.1 0.6 0.5 1.18 1.26 0.0149
5 versicolor Sepal.Length 1 50 5.94 0.516 5.9 5.94 0.519 4.9 7 2.1 0.0991 -0.694 0.0730
6 versicolor Sepal.Width 2 50 2.77 0.314 2.8 2.78 0.297 2 3.4 1.4 -0.341 -0.549 0.0444
7 versicolor Petal.Length 3 50 4.26 0.470 4.35 4.29 0.519 3 5.1 2.1 -0.571 -0.190 0.0665
8 versicolor Petal.Width 4 50 1.33 0.198 1.3 1.32 0.222 1 1.8 0.8 -0.0293 -0.587 0.0280
9 virginica Sepal.Length 1 50 6.59 0.636 6.5 6.57 0.593 4.9 7.9 3 0.111 -0.203 0.0899
10 virginica Sepal.Width 2 50 2.97 0.322 3 2.96 0.297 2.2 3.8 1.6 0.344 0.380 0.0456
11 virginica Petal.Length 3 50 5.55 0.552 5.55 5.51 0.667 4.5 6.9 2.4 0.517 -0.365 0.0780
12 virginica Petal.Width 4 50 2.03 0.275 2 2.03 0.297 1.4 2.5 1.1 -0.122 -0.754 0.0388
I have the impression I can do that with some function from purrr
. But I'm having a hard time putting it together.
Convert to tibble/data.frame
on each of the list
element with map
and bind them into a single data (_dfr
)
library(tibble)
library(purrr)
library(dplyr)
map_dfr(m, ~ .x %>%
as_tibble(rownames = "var") %>%
slice(-n()), .id = "group")
-output
# A tibble: 12 × 15
group var vars n mean sd median trimmed mad min max range skew kurtosis se
<chr> <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 setosa Sepal.Length 1 50 5.01 0.352 5 5.00 0.297 4.3 5.8 1.5 0.113 -0.451 0.0498
2 setosa Sepal.Width 2 50 3.43 0.379 3.4 3.42 0.371 2.3 4.4 2.1 0.0387 0.596 0.0536
3 setosa Petal.Length 3 50 1.46 0.174 1.5 1.46 0.148 1 1.9 0.9 0.100 0.654 0.0246
4 setosa Petal.Width 4 50 0.246 0.105 0.2 0.238 0 0.1 0.6 0.5 1.18 1.26 0.0149
5 versicolor Sepal.Length 1 50 5.94 0.516 5.9 5.94 0.519 4.9 7 2.1 0.0991 -0.694 0.0730
6 versicolor Sepal.Width 2 50 2.77 0.314 2.8 2.78 0.297 2 3.4 1.4 -0.341 -0.549 0.0444
7 versicolor Petal.Length 3 50 4.26 0.470 4.35 4.29 0.519 3 5.1 2.1 -0.571 -0.190 0.0665
8 versicolor Petal.Width 4 50 1.33 0.198 1.3 1.32 0.222 1 1.8 0.8 -0.0293 -0.587 0.0280
9 virginica Sepal.Length 1 50 6.59 0.636 6.5 6.57 0.593 4.9 7.9 3 0.111 -0.203 0.0899
10 virginica Sepal.Width 2 50 2.97 0.322 3 2.96 0.297 2.2 3.8 1.6 0.344 0.380 0.0456
11 virginica Petal.Length 3 50 5.55 0.552 5.55 5.51 0.667 4.5 6.9 2.4 0.517 -0.365 0.0780
12 virginica Petal.Width 4 50 2.03 0.275 2 2.03 0.297 1.4 2.5 1.1 -0.122 -0.754 0.0388
By converting to data.frame/tibble
, it removes the extra class
- psych/describe
map_dfr(m, ~ {class(.x) <- c("tbl_df", "data.frame")
.x},
.id = "var")