I am trying to learn to write functions and exploring making a function to do an ANOVA and post F test. I have simplified this to the problem which is obtaining emmeans and associated all pairwise comparisons. The issue is the pairwise comparisons do not work??? I am real new at functions and have tried a lot of ways to fix this to no avail...
I am using the diamonds data set for reproducibility...
Here is the code:
# libraries
library(tidyverse)
library(car)
library(emmeans)
library(readxl)
library(rlang)
library(broom)
dia <- diamonds
dia$cut <- as.factor(as.character(dia$cut))
results_LM <- function(data, iv, dv) {
iv_name <- as.character(substitute(iv))
dv_name <- as.character(substitute(dv))
formula_str <- paste(dv_name, "~", iv_name)
aov.model <- lm(formula_str, data = data)
emm.means <- emmeans(aov.model,
specs = iv_name,
by = iv_name)
emm.pairs <- pairs(emm.means)
output <- list(
aov_model_summary = summary(aov.model),
emm.means,
emm.pairs)
return(output)
}
results_LM(dia, cut, price)
This issue is the emmeans output looks odd and the pairwise comparisons are missing
[[3]]
cut = Fair:
contrast estimate SE df z.ratio p.value
(nothing) nonEst NA NA NA NA
cut = Good:
contrast estimate SE df z.ratio p.value
(nothing) nonEst NA NA NA NA
cut = Ideal:
contrast estimate SE df z.ratio p.value
(nothing) nonEst NA NA NA NA
cut = Premium:
contrast estimate SE df z.ratio p.value
(nothing) nonEst NA NA NA NA
cut = Very Good:
contrast estimate SE df z.ratio p.value
(nothing) nonEst NA NA NA NA```
Note that when I do the anova and emmeans post F test outside of the function it works fine?
Any help is appreciated and pointers to learn to write functions like this on dataframes would be GREAT. Thanks Bill
You were really close. I think the only thing is you don't need the by
argument in the call to emmeans()
. I might also suggest using reformulate()
to make the formula. See below:
library(tidyverse)
library(emmeans)
dia <- diamonds
dia$cut <- as.factor(as.character(dia$cut))
results_LM <- function(data, iv, dv) {
iv_name <- as.character(substitute(iv))
dv_name <- as.character(substitute(dv))
formula_str <- reformulate(iv_name, response=dv_name)
aov.model <- lm(formula_str, data = data)
emm.means <- emmeans(aov.model,
specs = iv_name)
emm.pairs <- pairs(emm.means)
output <- list(
aov_model_summary = summary(aov.model),
emm.means,
emm.pairs)
names(output) <- c("Model", "Marginal Means", "Pairwise Comparisons")
return(output)
}
results_LM(dia, cut, price)
#> $Model
#>
#> Call:
#> lm(formula = formula_str, data = data)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -4258 -2741 -1494 1360 15348
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 4358.76 98.79 44.122 < 2e-16 ***
#> cutGood -429.89 113.85 -3.776 0.000160 ***
#> cutIdeal -901.22 102.41 -8.800 < 2e-16 ***
#> cutPremium 225.50 104.40 2.160 0.030772 *
#> cutVery Good -377.00 105.16 -3.585 0.000338 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 3964 on 53935 degrees of freedom
#> Multiple R-squared: 0.01286, Adjusted R-squared: 0.01279
#> F-statistic: 175.7 on 4 and 53935 DF, p-value: < 2.2e-16
#>
#>
#> $`Marginal Means`
#> cut emmean SE df lower.CL upper.CL
#> Fair 4359 98.8 53935 4165 4552
#> Good 3929 56.6 53935 3818 4040
#> Ideal 3458 27.0 53935 3405 3510
#> Premium 4584 33.8 53935 4518 4650
#> Very Good 3982 36.1 53935 3911 4052
#>
#> Confidence level used: 0.95
#>
#> $`Pairwise Comparisons`
#> contrast estimate SE df t.ratio p.value
#> Fair - Good 429.9 113.8 53935 3.776 0.0015
#> Fair - Ideal 901.2 102.4 53935 8.800 <.0001
#> Fair - Premium -225.5 104.4 53935 -2.160 0.1950
#> Fair - Very Good 377.0 105.2 53935 3.585 0.0031
#> Good - Ideal 471.3 62.7 53935 7.517 <.0001
#> Good - Premium -655.4 65.9 53935 -9.946 <.0001
#> Good - Very Good -52.9 67.1 53935 -0.788 0.9341
#> Ideal - Premium -1126.7 43.2 53935 -26.067 <.0001
#> Ideal - Very Good -524.2 45.1 53935 -11.636 <.0001
#> Premium - Very Good 602.5 49.4 53935 12.198 <.0001
#>
#> P value adjustment: tukey method for comparing a family of 5 estimates
If you wanted the rlang
evaluation instead, you could replace as.character(substitute())
with as_label(enquo())
.
Since you're trying to learn how functions work and what you can do with them, let me demonstrate one other thing. You could give your returned result a class (e.g., "lm_pwc"), by setting the class of the output list with class(output) <- "lm_pwc"
.
library(tidyverse)
library(emmeans)
dia <- diamonds
dia$cut <- as.factor(as.character(dia$cut))
results_LM <- function(data, iv, dv) {
iv_name <- as_label(enquo(iv))
dv_name <- as_label(enquo(dv))
formula_str <- reformulate(iv_name, response=dv_name)
aov.model <- lm(formula_str, data = data)
emm.means <- emmeans(aov.model,
specs = iv_name)
emm.pairs <- pairs(emm.means)
output <- list(
aov_model_summary = summary(aov.model),
means = emm.means,
pairs = emm.pairs)
class(output) <- "lm_pwc"
return(output)
}
Then, you could make a separate print.lm_pwc()
function that would allow you to print only the desired part of the results. For example, the function below by default prints only the pairwise comparisons, but the returned result contains all three elements in the initial return. This allows you to return more objects than you print. Here's an example:
print.lm_pwc <- function(x, ..., which=c("pairs", "model", "means", "all")){
w <- match.arg(which, several.ok=FALSE)
if(w %in% c("model", "all")){
cat("Model Results:\n")
print(x[[1]])
}
if(w %in% c("means", "all")){
cat("Marginal Means:\n")
print(x[[2]])
}
if(w %in% c("pairs", "all")){
cat("Pairwise Comparisons:\n")
print(x[[3]])
}
}
res <- results_LM(dia, cut, price)
res
#> Pairwise Comparisons:
#> contrast estimate SE df t.ratio p.value
#> Fair - Good 429.9 113.8 53935 3.776 0.0015
#> Fair - Ideal 901.2 102.4 53935 8.800 <.0001
#> Fair - Premium -225.5 104.4 53935 -2.160 0.1950
#> Fair - Very Good 377.0 105.2 53935 3.585 0.0031
#> Good - Ideal 471.3 62.7 53935 7.517 <.0001
#> Good - Premium -655.4 65.9 53935 -9.946 <.0001
#> Good - Very Good -52.9 67.1 53935 -0.788 0.9341
#> Ideal - Premium -1126.7 43.2 53935 -26.067 <.0001
#> Ideal - Very Good -524.2 45.1 53935 -11.636 <.0001
#> Premium - Very Good 602.5 49.4 53935 12.198 <.0001
#>
#> P value adjustment: tukey method for comparing a family of 5 estimates
Notice that the above only prints the pairwise comparisons, but the returned object contains all three elements.
names(res)
#> [1] "aov_model_summary" "means" "pairs"
You could print just the model with print(res, which = "model")
or the marginal means with print(res, which="means")
or you could show all results with print(res, which="all")
. Note, that the first choice in the function definition (e.g., "pairs" above) is the default.
Created on 2023-08-09 with reprex v2.0.2