rggplot2z-axisviolin-plot

violin plot ggplot2 with width from column


I am pretty new to R and I only use it for visualization so I may be missing something simple.

Simply what I want is, I have two columns which should be x and y axes. Third column I have should define the width of the graph. I didn't come far with the code even though I tried many things from different answers. Let's say that I am this far in code:

ggplot(disM, aes(x=study, y=value)) +
  geom_violin() +
  labs(list(title="Distribution", x="Studies", y="Ranges"))

which is not really achieving anything.

I have a table like this:

  Col0         study     value
1    30-31 breast cancer    357263
2    32-33 breast cancer    352067
3    34-35 breast cancer    340264
4    36-37 breast cancer    309827
5    38-39 breast cancer    298684
6    40-41 breast cancer    322570
7    42-43 breast cancer    338480
8    44-45 breast cancer    354451
9    46-47 breast cancer    429183
10   48-49 breast cancer    396942
11   50-51 breast cancer    415195
12   52-53 breast cancer    368217
13   54-55 breast cancer    445884
14   56-57 breast cancer    395652
15   58-59 breast cancer    386643
16   60-61 breast cancer    461940
17   62-63 breast cancer    473772
18   64-65 breast cancer    464228
19   66-67 breast cancer    485851
20   68-69 breast cancer    513411
21   70-71 breast cancer    576618
22   72-73 breast cancer    588724
23   74-75 breast cancer    634343
24   76-77 breast cancer    584662
25   78-79 breast cancer    608901
26   80-81 breast cancer    617286
27   82-83 breast cancer    659318
28   84-85 breast cancer    757167
29   86-87 breast cancer   1044465
30   88-89 breast cancer    982901
31   90-91 breast cancer   1114269
32   92-93 breast cancer   1110257
33   94-95 breast cancer   1742966
34   96-97 breast cancer   6379974
35   98-99 breast cancer   3437746
36 100-101 breast cancer 118984063
37   30-31  renal cancer   1055566
38   32-33  renal cancer   1089405
39   34-35  renal cancer   1228087
40   36-37  renal cancer   1265606
41   38-39  renal cancer   1264919
42   40-41  renal cancer   1248949
43   42-43  renal cancer   1391738
44   44-45  renal cancer   1453100
45   46-47  renal cancer   1443915
46   48-49  renal cancer   1429785
47   50-51  renal cancer   1372041
48   52-53  renal cancer   1339706
49   54-55  renal cancer   1418135
50   56-57  renal cancer   1484162
51   58-59  renal cancer   1582617
52   60-61  renal cancer   1571977
53   62-63  renal cancer   1652503
54   64-65  renal cancer   1742230
55   66-67  renal cancer   1859936
56   68-69  renal cancer   1928028
57   70-71  renal cancer   2041783
58   72-73  renal cancer   2108994
59   74-75  renal cancer   2154244
60   76-77  renal cancer   2218430
61   78-79  renal cancer   2333206
62   80-81  renal cancer   2377262
63   82-83  renal cancer   2345651
64   84-85  renal cancer   2402114
65   86-87  renal cancer   2519284
66   88-89  renal cancer   2542761
67   90-91  renal cancer   2587606
68   92-93  renal cancer   2308279
69   94-95  renal cancer   2980927
70   96-97  renal cancer  14108950
71   98-99  renal cancer   2762116
72 100-101  renal cancer 211513230

X axis should be study column, y should be Col0 and width of the violin plot should be value column. I cannot split col0 as I only have the data as a range.

Any pointer for what to check, how to do this will be appreciated. Sorry if I missed a similar question.

Thanks in advance


Solution

  • I'm going to take a guess. (If I'm right, you could also look for information about pyramid plots.)

    Reorder labels so that "100-101" really comes at the end:

    disM$Col0 <- factor(disM$Col0,levels=unique(disM$Col0))
    

    Rearrange to make it easier to draw polygons (I wish there was an easier way to do this, but can't think of one):

    library(plyr)
    disM2 <- ddply(disM,"study",
       function(dd) with(dd,
                 data.frame(y=c(as.numeric(Col0),rev(as.numeric(Col0))),
                            x=c(-value/2,rev(value/2)))))
    
    
    library(ggplot2); theme_set(theme_bw())
    ggplot(disM2)+
        geom_polygon(aes(x,y),alpha=0.5)+
        facet_wrap(~study)+
        labs(list(title="Distribution"))+
        scale_y_continuous(breaks=as.numeric(disM$Col0),
                           labels=disM$Col0)+
        scale_x_continuous(labels=NULL)
    

    enter image description here