rstatisticssurveyweightedkruskal-wallis

Error in kruskal_test (coin) with Weighted Data: “Invalid Object Class 'IndependenceProblem’”


I am trying to perform a weighted Kruskal-Wallis test for each group in a dataset, but I keep encountering the following error:

Error in kruskal_test: Error in validObject(.Object): invalid class “IndependenceProblem” object: FALSE

Here is the code I am using:

# Load necessary libraries
library(tidyverse)
library(coin)

# Set seed for reproducibility
set.seed(1)

# Create the dataset
punteggi <- tibble(
  Codice = paste0("cod_", 1:200),
  Regione = sample(c("FVG", "Lazio", "Sicilia"), size = 200, replace = TRUE, prob = c(.4, .2, .4)),
  Genere = sample(c("Femminile", "Maschile"), size = 200, replace = TRUE, prob = c(.6, .4)),
  Area = sample(c("Urban", "Suburban"), size = 200, replace = TRUE, prob = c(.6, .4))
)

punteggi <- punteggi |>
  group_by(Regione, Genere, Area) |>
  mutate(
    mu = runif(1, 4, 7),
    sd = runif(1, 1, 2),
    weight = 1/n()
  ) |> 
  ungroup() |>
  rowwise() |>
  mutate(pt_tot = rnorm(1, mu, sd))

print(punteggi)

# Function to perform the weighted Kruskal-Wallis test for each group
kruskal_per_gruppo <- function(data) {
  test_result <- tryCatch({
    data <- data |> mutate(across(weight, ~./sum(.)))
    kruskal_test(pt_tot ~ value, data = data, weights = ~ weight)
  }, error = function(e) {
    message("Error in kruskal_test: ", e)
    return(NULL)
  })
  return(test_result)
}

# Apply the Kruskal-Wallis test for each group
results <- punteggi |> 
  pivot_longer(c(Regione, Genere, Area)) |> 
  group_by(name) |> 
  mutate(
    across(value, as.factor),
    kruskal = kruskal.test(pt_tot ~ value)$p.value
  ) |> 
  nest() |> 
  mutate(kruskal_weighted = map(data, kruskal_per_gruppo))

print(results)

What might be causing this error and how can I resolve it? I have checked that the weights are correctly normalized, but it seems there is an issue with creating the IndependenceProblem object. Any suggestions would be greatly appreciated!


Solution

  • The weights for all the coin package tests must be integers. They are treated as frequency weights -- ie, if you have a weight of 3 it is exactly like having three copies of that observation.