I am hoping someone can help me.
I am performing Cramer's V tests on categorical data in R. Here's an example of the code:
#cramer's v
df1 <- subset(ACCIDENT_MASTER_single, select = c("SEVERITY", "ATMOSPH_COND"))
# Converting into numeric matrix
df3 <- data.matrix(df1)
#calculate Cramer's V
cramerV(df3)
I am using Shiny so that a user can select the categorical variables via dropdown menus and then the result of the Cramer's V is displayed. My code works, but interestingly, the results I am getting are completely different, even though I am using the same dataframe. Can anyone tell me why?
Here is an example of the R code using the Shiny package:
library(shinydashboard)
library(shiny)
library(dplyr)
library(DT)
library(rcompanion)
df <- data.frame(ACCIDENT_MASTER_single)
Cat1.Variables <- c("SEVERITY", "ATMOSPH_COND", "DAY_OF_WEEK")
Cat2.Variables <- c("SEVERITY", "ATMOSPH_COND", "DAY_OF_WEEK")
ui <- fluidPage(
titlePanel("Calculate the strength of the relationship between categorical variables"),
sidebarLayout(
sidebarPanel(
selectInput("cat1", choices = Cat1.Variables, label = "Select a Categorical Variable:"),
selectInput("cat2", choices = Cat2.Variables, label = "Select a Categorical Variable:")
),
mainPanel(
tableOutput("results")
)
)
)
server <- shinyServer(function(input, output) {
cramerdata <- reactive({
req(input$cat1, input$cat2)
df %>%
{
table(.[[input$cat1]], .[[input$cat2]])
}
})
output$results <- renderPrint({
cat(sprintf("\nThe results equal: \n"))
print(cramerV(cramerdata()))
})
})
shinyApp(ui, server)
Also, I have tested this on a number of different variables and all of my results are different, not just for the two variables in this example. Would love some help please!
EDIT: someone suggested I use dput(head(ACCIDENT_MASTER_single)) so a snippet of my results of that are found below (the dataset is very large). I hope this helps!
> dput(head(ACCIDENT_MASTER_single))
structure(list(ACCIDENT_NO = c("T20150000004", "T20150000017",
"T20150000020", "T20150000028", "T20150000034", "T20150000052"
), ACCIDENTDATE = c("2015-01-01", "2015-01-01", "2015-01-01",
"2015-01-01", "2015-01-01", "2015-01-01"), ACCIDENTTIME = c("02:10:00",
"07:20:00", "06:51:00", "07:55:00", "17:10:00", "01:20:00"),
ACCIDENT_TYPE = c(2L, 1L, 4L, 1L, 4L, 1L), DAY_OF_WEEK = c(5L,
5L, 5L, 4L, 5L, 5L), DCA_CODE = c(108L, 130L, 173L, 135L,
171L, 121L), DIRECTORY = c("MEL", "MEL", "MEL", "MEL", "MEL",
"MEL"), LIGHT_CONDITION = c(3L, 1L, 2L, 1L, 1L, 3L), ROAD_GEOMETRY = c(5L,
4L, 1L, 5L, 5L, 1L), SEVERITY = c(3L, 2L, 1L, 3L, 3L, 2L),
SPEED_ZONE = c(60L, 70L, 70L, 100L, 60L, 60L), ROAD_TYPE = c("ROAD",
"ROAD", "ROAD", "ROAD", "ROAD", "DRIVE"), ATMOSPH_COND = c("1",
"1", "1", "1", "1", "1"), ATMOSPH_COND_SEQ = c("1", "1",
"1", "0", "1", "1"), LGA_NAME = c("MOONEE VALLEY", "MONASH",
"BAYSIDE", "BRIMBANK", "MELTON", "BRIMBANK"), DEG_URBAN_NAME = c("MELB_URBAN",
"MELB_URBAN", "MELB_URBAN", "MELB_URBAN", "MELB_URBAN", "MELB_URBAN"
), Lat = c(-37.77922923, -37.88240078, -37.92909811, -37.76758102,
-37.72427767, -37.76316596), Long = c(144.9309415, 145.0903658,
145.0028103, 144.8002374, 144.7529804, 144.7897546), POSTCODE_NO = c(3032L,
3148L, 3186L, 3022L, 3023L, 3023L), Surface.Cond.Desc = c("Dry",
"Dry", "Dry", "Dry", "Dry", "Dry"), SURFACE_COND = c("1",
"1", "1", "1", "1", "1"), SURFACE_COND_SEQ = c("1", "1",
"1", "0", "1", "1"), ROAD_SURFACE_TYPE = c("1", "1,1", "1",
"1,1", "1", "1,1"), VEHICLE_TYPE = c("99", "5,2", "1", "1,62",
"1", "1,1"), TRAFFIC_CONTROL = c("0", "1,1", "0", "0,0",
"0", "1,1"), EVENT_TYPE = c("C", "C", "3,C", "C,3,C,3,C",
"3,C", "C"), SEX = c("M,U", "M,M", "M", "F,U", "M", "M,M,M,F"
), AGE = c("32,NA", "56,43", "28", "54,NA", "23", "17,16,19,41"
), Age.Group = c("30-39,unknown", "50-59,40-49", "26-29",
"50-59,unknown", "22-25", "16-17,16-17,17-21,40-49"), INJ_LEVEL = c("3,4",
"2,3", "1", "3,4", "3", "2,4,4,3"), ROAD_USER_TYPE = c("1,9",
"2,2", "2", "2,2", "2", "3,3,2,2")), row.names = c(NA, 6L
), class = "data.frame")
Thanks
The result is working for me... Try setting the seed
also: set.seed(1)
cramerdata <- reactive({
req(input$cat1, input$cat2)
df3 <- data.matrix(ACCIDENT_MASTER_single[c(input$cat1, input$cat2)])
df3
})
output$results <- renderPrint({
cat(sprintf("\nThe results equal: \n"))
print(cramerV(cramerdata()))
})