I have created some weighted Kernal density estimates across different factor levels which I don't think can be incorporated within geom_violin plot estimates. I was wondering if there's a way geom_violin or other ggplot2 functions could use raw data to create violin plots rather than the built-in density calculations? Any help would be much appreciated. Some example code, where I would want to create a violin plot based of the variation in y values across the spread of x values...
###Create data
Surge_vs_Plummet_Stats_df <- data.frame(
Type = rep(c("Surge", "Plummet"), each = 50),
site_no = sample(1:10, 100, replace = TRUE),
Mean = c(rnorm(50, mean = 5, sd = 2), rnorm(50, mean = 3, sd = 1)))
###Calculate weights
station_counts <- table(Surge_vs_Plummet_Stats_df$site_no)
Surge_vs_Plummet_Stats_df$Weights <- 1 / station_counts[Surge_vs_Plummet_Stats_df$site_no]
Surge_vs_Plummet_Stats_df$Weights <- Surge_vs_Plummet_Stats_df$Weights /
sum(Surge_vs_Plummet_Stats_df$Weights)##Normalize (sum to 1)
###Identify bandwidth
bw <- bw.nrd(Surge_vs_Plummet_Stats_df$Mean)##Not 100% sure its doing much
###Now separate the dfs run KDEs
Surge_vs_Plummet_Stats_Surge <- Surge_vs_Plummet_Stats_df%>%filter(Type == "Surge")%>%mutate(Weights = Weights / sum(Weights))
Surge_kde <- density(Surge_vs_Plummet_Stats_Surge$Mean, weights = Surge_vs_Plummet_Stats_Surge$Weights,bw = bw,
from=min(Surge_vs_Plummet_Stats_df$Mean), to=max(Surge_vs_Plummet_Stats_df$Mean))##Delib the full df and not just surges
#
Surge_vs_Plummet_Stats_Plummet <- Surge_vs_Plummet_Stats_df%>%filter(Type == "Plummet")%>%mutate(Weights = Weights / sum(Weights))
Plummet_kde <- density(Surge_vs_Plummet_Stats_Plummet$Mean, weights = Surge_vs_Plummet_Stats_Plummet$Weights,bw = bw,
from=min(Surge_vs_Plummet_Stats_df$Mean), to=max(Surge_vs_Plummet_Stats_df$Mean))##Delib the full df and not just surges
##
Mean_Kernel_df <- data.frame(x = c(Surge_kde$x,Plummet_kde$x),
y = c(Surge_kde$y,Plummet_kde$y),
Type = c(rep("Surge",times=length(Surge_kde$x)),
rep("Plummet",times=length(Surge_kde$y))))
You can typically use geoms without their associated statistical transformations (i.e. their stat_
), by using stat = "identity"
. We can also do this for geom_violin
, like so:
ggplot(Mean_Kernel_df, aes(Type, x, violinwidth = y)) +
geom_violin(stat = 'identity')
However, this will generate some warnings, and using violinwidth
as an aesthethic is not documented, so this could conceivably break in the future. Also, the width is not scaled automatically, so you may want to use e.g. violinwidth = y * 2
.
Alternatively, maybe you are just looking for:
ggplot(Surge_vs_Plummet_Stats_df, aes(Type, Mean, weight = as.numeric(Weights))) +
geom_violin(bw = bw)