plotdata-visualizationbar-chartgnuplotblast

Gnuplot bar chart with personalize interval on x-axis


I'm new using gnuplot and i would like to replicate this plot: https://images.app.goo.gl/DqygL2gfk3jZ7jsK6

I have a file.dat with continuous value between 0 and 100 and i would like to plot it, subdivided in intervals ( pident> 98, 90 < pident < 100...) Etc. And on y-axis the total occurrences.

I looked everywhere finding a way but still I cannot do it.

Thank you ! sample of the data, with the value and the counts:

33.18 5
43.296 1
33.19 1
27.168 5
71.429 11
30.698 9
47.934 1
43.299 3
30.699 3
37.092 2
24.492 2
24.493 2
24.494 7
47.938 1
24.497 1
37.097 8
37.099 2
33.824 7
51.111 15
59.025 2
62.553 2
62.554 2
57.867 2
33.826 2
62.555 1
33.827 5
62.556 2
33.828 1
59.028 1
46.429 11
51.117 1
75.158 2
27.621 1
27.623 1
27.624 2
37.5 113
37.6 2
32.313 8
27.626 3
37.7 3
32.314 1
67.797 3
27.628 2
32.316 2
37.9 1
61.044 1
43.81 5
32.317 8
32.318 2
43.82 4
32.319 2
43.83 2
37.551 3
61.048 1
48.993 6
29.43 2

This is the code tried so far (where i also calculate the mean):

#!/usr/bin/gnuplot -persist
set noytics

# Find the mean
mean= system("awk '{sum+=$1*$2; tot+=$2} END{print sum/tot}' hist.dat")

set arrow 1 from mean,0 to mean, graph 1 nohead ls 1 lc rgb "blue"
set label 1 sprintf(" Mean: %s", mean) at mean, screen 0.1

# Histogram
binwidth=10
bin(x,width)=width*floor(x/width)
plot 'hist.dat' using (bin($1,binwidth)):(1.0) smooth freq with boxes

This is the result: enter image description here


Solution

  • The following script takes your data and sums up the second column within the defined bins. If you have values of equal 100 in the first column, those values would be in the bin 100-<110.

    With Bin(x) = floor(x/BinWidth)*BinWidth + BinWidth*0.5, the bins are shifted by half a binwidth to let the boxes on the x-axis range from the beginning of the bin to the end of the bin (and not centered at the beginning of the respective bin).

    If you explicitely want to have xtics labels like in the example graph you've shown, i.e. 10-<20, 20-<30 etc. you would have to fiddle around with the xtic labels.

    Edit: Forgot the mean value. There is no need for calling awk. Gnuplot can do this for you as well, check help stats.

    Code:

    ### create histogram
    reset session
    
    $Data <<EOD
    33.18 5
    43.296 1
    33.19 1
    27.168 5
    71.429 11
    30.698 9
    47.934 1
    43.299 3
    30.699 3
    37.092 2
    24.492 2
    24.493 2
    24.494 7
    47.938 1
    24.497 1
    37.097 8
    37.099 2
    33.824 7
    51.111 15
    59.025 2
    62.553 2
    62.554 2
    57.867 2
    33.826 2
    62.555 1
    33.827 5
    62.556 2
    33.828 1
    59.028 1
    46.429 11
    51.117 1
    75.158 2
    27.621 1
    27.623 1
    27.624 2
    37.5 113
    37.6 2
    32.313 8
    27.626 3
    37.7 3
    32.314 1
    67.797 3
    27.628 2
    32.316 2
    37.9 1
    61.044 1
    43.81 5
    32.317 8
    32.318 2
    43.82 4
    32.319 2
    43.83 2
    37.551 3
    61.048 1
    48.993 6
    29.43 2
    EOD
    
    # Histogram
    BinWidth = 10
    Bin(x)   = floor(x/BinWidth)*BinWidth + BinWidth*0.5
    
    # Mean
    stats $Data u ($1*$2):2 nooutput
    mean = STATS_sum_x/STATS_sum_y
    set arrow 1 from mean, graph 0 to mean, graph 1 nohead lw 2 lc rgb "red" front
    set label 1 sprintf("Mean: %.1f", mean) at mean, graph 1 offset 1,-0.7
    
    set xlabel "Identity / %"
    set xrange [0:100]
    set xtics 10 out
    set ylabel "The number of blast hits"
    set style fill solid 0.3
    set boxwidth BinWidth
    set key noautotitle
    set grid x,y
    
    plot $Data using (Bin($1)):2 smooth freq with boxes lc "blue"
    ### end of code
    

    Result: enter image description here