rggplot2ggboxplot

ggplot2 grouped boxplot doesn't separate groups for different iterations


This is my first time posting here after years of being an anonymous reader. Please be kind to me in case the format for posting questions is wrong.

My dataset involves storing iterations and particles and associated llh and se values.

          llh           se Particles Iterations     Time
1         NaN           NA       500          5    7.222
2  -2087.0886  41.53552846      1000          5   14.149
3  -1903.6823 171.30398540      1500          5   19.488
4  -2474.3789           NA      2000          5   25.336
5  -1229.1886   1.33015305      3000          5   37.858
6  -1331.1882   9.66674817      5000          5   60.994
7  -2330.5701  35.17979986      7500          5   92.654
8  -1753.6308 137.62546543       500         10   13.891
9  -1468.1730  64.58164086      1000         10   26.474
10 -2221.8960  73.11124703      1500         10   37.651
11 -2606.5620  46.51251610      2000         10   51.719
12 -1301.0474  12.59814717      3000         10   75.776
13  -927.7820   0.18559457      5000         10  125.121
14 -1180.8230  10.55185851      7500         10  151.593
15 -3109.6442  55.29536888       500         15   15.997
16 -1959.0457  44.58603179      1000         15   39.391
17 -1268.8367  24.06368751      1500         15   58.382
18 -2832.5527           NA      2000         15   76.853
19  -845.2781   0.21124844      3000         15   99.497
20  -845.4272   0.02649884      5000         15  147.611
21 -1446.8511  17.06673528      7500         15  217.608

or if dput() is preferred:

> dput(logliks[1:21,])
structure(list(llh = c(NaN, -2087.08855486818, -1903.6823477862, 
-2474.37893002966, -1229.18856210967, -1331.18815912831, -2330.57009669248, 
-1753.63084316259, -1468.17297841903, -2221.89596236152, -2606.56196704478, 
-1301.0473771866, -927.782003670307, -1180.82300393742, -3109.64417468708, 
-1959.04572793909, -1268.83669965093, -2832.5527445189, -845.278087151579, 
-845.427210637555, -1446.85110262111), se = c(NA, 41.5355284568715, 
171.303985396005, NA, 1.33015305002498, 9.66674817155666, 35.1797998633679, 
137.625465433877, 64.5816408601655, 73.1112470277094, 46.5125161022654, 
12.5981471672579, 0.185594570806789, 10.5518585121374, 55.2953688797359, 
44.5860317855338, 24.0636875106622, NA, 0.21124844438021, 0.0264988432776242, 
17.0667352804977), Particles = c(500, 1000, 1500, 2000, 3000, 
5000, 7500, 500, 1000, 1500, 2000, 3000, 5000, 7500, 500, 1000, 
1500, 2000, 3000, 5000, 7500), Iterations = c(5, 5, 5, 5, 5, 
5, 5, 10, 10, 10, 10, 10, 10, 10, 15, 15, 15, 15, 15, 15, 15), 
    Time = c(7.222, 14.149, 19.488, 25.336, 37.858, 60.994, 92.654, 
    13.891, 26.474, 37.651, 51.719, 75.776, 125.121, 151.593, 
    15.997, 39.391, 58.382, 76.853, 99.497, 147.611, 217.608)), row.names = c(NA, 
21L), class = "data.frame")

I was trying to plot a box-plot and it is not grouping as expected. I tried discretizing the x-axis according to another post I found here, however it gives me the error "Error: Discrete value supplied to continuous scale".

Here's my code:

library(ggthemes)


g <- ggplot(logliks,aes(x=factor(Iterations), y=llh, group=Particles, fill=factor(Particles)))+
  geom_boxplot(position=position_dodge(1))+ 
  ylim(-4000,-400)+
  xlim(5,250)+
  theme(axis.text.x = element_text(angle=65, vjust=0.6))+ 
  labs(title="log Likelihoods", 
       subtitle = TeX(paste("For one guess of $\\epsilon$ and $\\kappa$ each")),
       caption="Likelihoods with respect to iterations and particles",
       x="Iterations",
       y ="log Likelihood",
       fill = paste("Particles"))+
  scale_fill_manual(values = colour_palette_parts)+
  guides(colour = guide_legend(override.aes = list(size=6,shape = 20),nrow=2))+
  theme_bw()+
  themespecs

I have defined my own colour palette colour_palette_parts

colour_palette_parts <- c("#ffbe0b", "#ff8e09", "#ff5d07", "#ff2b05", "#ff040e", "#ff023e", "#ff006e")

and also called library(latex2exp) for the LateX symbols in captions/subtitles.

Here's what I want: Taken from another website

Here's what I get using the above code, except not discretizing the x-axis (i.e. using ...aes(x=Iterations,... instead of ...aes(x=factor(Iterations),...).

I even get the error "position_dodge() requires non-overlapping x intervals "

The box plot is not grouping each number of particles for each iteration. Ideally, I would like to have 7 smaller box-plots corresponding to each iteration.

How can I separate them into little boxes? Kindly help me out. Thanks in advance!

Update: I have found out how to discretize the x-axis without the error: ...aes(x=factor(Iterations,levels=c(5,10,15,20,30,50,100,150,200,250)), y=llh,...

Now it generates an image, albeit lack of grouping. This is the updated image which still lacks grouping into number of particles for each iteration.


Solution

  • I think this will put the boxes where you want them. geom_boxplot() already dodges them automatically on a per-fill basis. In your example data frame there is only one data point per box, so they look very narrow, but I think with your full dataset it will look as you expect.

    ggplot(logliks, aes(x=factor(Iterations), y=llh, fill=factor(Particles)))+
      geom_boxplot() +
      theme(axis.text.x = element_text(angle=65, vjust=0.6))+ 
      labs(title="log Likelihoods", 
           subtitle = paste("For one guess of $\\epsilon$ and $\\kappa$ each"),
           caption="Likelihoods with respect to iterations and particles",
           x="Iterations",
           y ="log Likelihood",
           fill = paste("Particles"))+
      scale_fill_manual(values = colour_palette_parts) +
      theme_bw()