I am trying to generate a data frame of intercepts and coefficients for multiple different linear regression models generated from subsets of a data set. I cannot share my data unfortunately but I can explain it using the mtcars set. I am creating a regression model predicting mpg from cyl, hp, and wt for each value of carb. After digging around for a while, I found many examples that do what I want but only for a model with 1 predictor (like mpg~wt). It all falls apart when I add the other terms. This is what I have based my work so far on: https://community.rstudio.com/t/extract-slopes-by-group-broom-dplyr/2751/8 Efficient way to extract coefficients from many linear regression lines
This is what I have tried
library(tidyverse);library(broom)
df <- mtcars
tryme <- df %>%
split(.$carb)%>%
map(~lm(mpg~cyl+hp+wt,data=.x)) %>%
map_df(tidy)
with this result
term estimate std.error statistic p.value
1 (Intercept) 46.034633 7.68430306 5.99073626 9.31E-03
2 cyl 2.650503624 4.14371413 0.63964442 5.68E-01
3 hp -0.230007961 0.1573354 -1.46189576 2.40E-01
4 wt -5.231999683 9.23136027 -0.56676368 6.11E-01
5 (Intercept) 39.84451509 2.95537984 13.48202845 1.03E-05
6 cyl -0.846094229 0.93995084 -0.90014732 4.03E-01
7 hp -0.007452737 0.03998485 -0.18638901 8.58E-01
8 wt -4.133340298 1.42757472 -2.89535829 2.75E-02
9 (Intercept) 17.50267062 22.13706712 0.79064993 5.74E-01
10 cyl NA NA NA NA
11 hp NA NA NA NA
12 wt -0.3115727 5.73067255 -0.05436931 9.65E-01
13 (Intercept) 45.33390978 12.93999647 3.50339429 1.28E-02
14 cyl -4.195214198 3.492613 -1.20116778 2.75E-01
15 hp 0.029361878 0.04927895 0.59583008 5.73E-01
16 wt -1.239041102 1.03937377 -1.19210349 2.78E-01
17 (Intercept) 19.7 NaN NaN NaN
18 cyl NA NA NA NA
19 hp NA NA NA NA
20 wt NA NA NA NA
21 (Intercept) 15 NaN NaN NaN
22 cyl NA NA NA NA
23 hp NA NA NA NA
24 wt NA NA NA NA
what I want is a table that looks like this:
carb intercept cyl hp wt
1 46.034633 2.650503624 -0.230007961 -5.231999683
2 39.84451509 -0.846094229 -0.007452737 -4.133340298
3 17.50267062 NA NA -0.3115727
4 45.33390978 -4.195214198 0.029361878 -1.239041102
6 19.7 NA NA NA
8 15 NA NA NA
I don't know how to bring the value of the grouping variable in. If I can get that added to what I already have I know how to transpose the data into the form I need.
I am updating a bit, as an answer here gave me what I wanted, but is causing an error that I can't seem to fix. the solution uses lmList from the nlme library. The code I am using is this
fit <- lmList(SentLength~Unified_UPPER+Unified_LOWER+GRADE | SentType, data=df, na.omit)
and this is the data I am running it on:
SentType SentLength Unified_UPPER Unified_LOWER GRADE
Jail 0.06578947 0.06666667 0.06666667 3
Other 0 6 0 4
Other 0 6 6 1
Probation 6 0 0 1
Other 0 9 0 6
Jail 1.41447368 6 0 3
Probation 6 0 0 1
Probation 6 0 0 1
Probation 12 0 0 2
Jail 6 16 6 6
Prison 36 27 21 5
Probation 24 9 0 6
Jail 0.23026316 0.5 0 1
Jail 0.09868421 0.06666667 0.06666667 1
Probation 60 27 21 6
Probation 6 1 0 1
Probation 24 0 0 3
Prison 36 54 36 8
Probation 24 9 0 6
Probation 6 0 0 1
Probation 0 0.06666667 0.06666667 1
Probation 24 0 0 3
Jail 0.13157895 1 0.06666667 1
Jail 0.09868421 1 0.1 1
Prison 42 60 48 8
Probation 6 0 0 2
Other 0 0 0 1
Prison 6 6 6 1
Prison 6 6 6 1
Other 0 16 9 7
I get this error: Error in na.fail.default(data) : missing values in object despite no missing values in the data. Can anyone help with this?
It always amazes me that people insist on using tidyverse mumbo-jumbo for this when it is so incredibly easy with package nlme, which is a "recommended package", meaning it should be part of your R installation and you don't even need to install it.
library(nlme)
fit <- lmList(mpg ~ cyl + hp + wt | carb, data = mtcars)
coef(fit)
# (Intercept) cyl hp wt
#1 46.03463 2.6505036 -0.230007961 -5.2319997
#2 39.84452 -0.8460942 -0.007452737 -4.1333403
#3 17.50267 NA NA -0.3115727
#4 45.33391 -4.1952142 0.029361878 -1.2390411
#6 19.70000 NA NA NA
#8 15.00000 NA NA NA
Note that the row names are the carb
values.
There is also a summary
method producing convenient output. However, read help("summary.lmList")
regarding the pool
parameter if you use it.