gnuplotboxplot

gnuplot: boxplot over several files


In Gnuplot v5.4.2 I'd like to make a boxplots for several data columns from different files, combined in one multi-boxplot with the same y-value range.

What I can do is t combine the data into one file, however the columns do not have the same number of entries. Can I just specify NaN (not a number) or some other value and Gnuplot will ignore it? For example, when the combined file looks like:

"label 1"  "label 2"  "label 3"
 12.3        44.2       13.3
 12.4        12.5       14.4
 11.6        13.7       NaN
 NaN         15.7       NaN

I had a look at http://www.gnuplot.info/demo_5.2/boxplot.html but the labels in files like silver.dat are hard-coded in the gplot script, and all data has the same number of rows.

Thank you.


Edit:

Stitching together pieces from boxplot.html,I now have this:

$DATA <<EOF
"label 1"  "label 2"  "label 3"
 11.3        14.2       11.3
 12.3        44.2       16.3
 1.4        12.5       14.4
 16.4        17.5       17.4
 17.4        12.5       14.4
 12.4        12.5       14.4
 11.6        13.7       NaN
 NaN         15.7       NaN
EOF

set terminal png size 400,300;
set output "box.png";

set key autotitle columnhead

factors = "\"label 1\"  \"label 2\"  \"label 3\""
NF = words(factors)

# No legend
unset key

# Solid box-and-whiskers where the whiskers extend 0%...100%
set style data boxplot
set style fill solid 0.5 border -1
set style boxplot fraction 1

set boxwidth  0.6
set xtic ("" 0)

set for [i=1:NF] xtics add (word(factors,i) i)

plot $DATA using (1):1, '' using (2):2, '' using (3):3 ;

Which produces:

generated by the above Gnuplot script

what's pretty close to what I am after. For multiple files, I would use something like:

...
set xtics add ("label 1" 1)
set xtics add ("label 2" 2)

set yrange[-0.4 : *]
set xrange[0.5 : 2.5]

plot "file1.data" using (1):(23+$5) , \
     "file2.data" using (2):(23+$5);

Solution

  • Assuming I understood your question correctly, you want to plot boxplots from different files and different columns in one graph.

    Somehow you have to specify the filenames and columns, e.g. in a string(list). So far, I haven't succeeded to use the xticlabels together with plotting style boxplot. So, the files is "plotted" a second time (actually, NaN is plotted, i.e. nothing) in order to get the corresponding columnheaders.

    For further reading, check help boxplot, help word, help words, help xticlabels, help columnhead.

    edit: added mean values to graph

    Check help stats, help arrays, help vectors.

    Data:

    SO78599002_1.dat

    # SO78599002_1
    File1Col1  File1Col2  File1Col3  File1Col4
     1         12         13         14
     2         22         23         24
     3         32         33         34
     4         42         43         44
     5         72         73         74
     6         82         83         84
     7         92         93         94
    

    SO78599002_2.dat

    # SO78599002_2
    File2Col1  File2Col2  File2Col3
     1         12         13
     2         22         23
     3         32         33
     4         72         73
     5         82         83
    

    SO78599002_3.dat

    # SO78599002_3
    File3Col1  File3Col2
     1         12
     2         22
     3         32
     4         42
     5         62
     6         72
    

    Script:

    ### boxplots from different files and selected columns plus mean values
    reset session
    
    FILES   = "SO78599002_1.dat   SO78599002_2.dat   SO78599002_3.dat"
    COLUMNS = "4   3   2"
    
    N       = words(FILES)
    File(i) = word(FILES,i) 
    Col(i)  = int(word(COLUMNS,i))
    
    set style fill solid 0.3
    set key noautotitle
    
    array MEANS[N] # set array size
    do for [i=1:N] {
        stats File(i) u Col(i) nooutput
        MEANS[i] = STATS_mean
    }
    
    plot for [i=1:N] File(i)  u (i):Col(i) w boxplot, \
         for [i=1:N] File(i)  u (i):(NaN):xtic(columnhead(Col(i))) w p, \
         MEANS u ($0+1-0.25):2:(0.5):(0) w vec dt 2 lc "black" nohead ti "mean value" 
    ### end of script
    

    Result:

    enter image description here