I'm new using gnuplot and i would like to replicate this plot: https://images.app.goo.gl/DqygL2gfk3jZ7jsK6
I have a file.dat with continuous value between 0 and 100 and i would like to plot it, subdivided in intervals ( pident> 98, 90 < pident < 100...) Etc. And on y-axis the total occurrences.
I looked everywhere finding a way but still I cannot do it.
Thank you ! sample of the data, with the value and the counts:
33.18 5
43.296 1
33.19 1
27.168 5
71.429 11
30.698 9
47.934 1
43.299 3
30.699 3
37.092 2
24.492 2
24.493 2
24.494 7
47.938 1
24.497 1
37.097 8
37.099 2
33.824 7
51.111 15
59.025 2
62.553 2
62.554 2
57.867 2
33.826 2
62.555 1
33.827 5
62.556 2
33.828 1
59.028 1
46.429 11
51.117 1
75.158 2
27.621 1
27.623 1
27.624 2
37.5 113
37.6 2
32.313 8
27.626 3
37.7 3
32.314 1
67.797 3
27.628 2
32.316 2
37.9 1
61.044 1
43.81 5
32.317 8
32.318 2
43.82 4
32.319 2
43.83 2
37.551 3
61.048 1
48.993 6
29.43 2
This is the code tried so far (where i also calculate the mean):
#!/usr/bin/gnuplot -persist
set noytics
# Find the mean
mean= system("awk '{sum+=$1*$2; tot+=$2} END{print sum/tot}' hist.dat")
set arrow 1 from mean,0 to mean, graph 1 nohead ls 1 lc rgb "blue"
set label 1 sprintf(" Mean: %s", mean) at mean, screen 0.1
# Histogram
binwidth=10
bin(x,width)=width*floor(x/width)
plot 'hist.dat' using (bin($1,binwidth)):(1.0) smooth freq with boxes
The following script takes your data and sums up the second column within the defined bins.
If you have values of equal 100 in the first column, those values would be in the bin 100-<110
.
With Bin(x) = floor(x/BinWidth)*BinWidth + BinWidth*0.5
, the bins are shifted by half a binwidth to let the boxes on the x-axis range from the beginning of the bin to the end of the bin (and not centered at the beginning of the respective bin).
If you explicitely want to have xtics labels like in the example graph you've shown, i.e. 10-<20
, 20-<30
etc. you would have to fiddle around with the xtic labels.
Edit: Forgot the mean value. There is no need for calling awk. Gnuplot can do this for you as well, check help stats
.
Code:
### create histogram
reset session
$Data <<EOD
33.18 5
43.296 1
33.19 1
27.168 5
71.429 11
30.698 9
47.934 1
43.299 3
30.699 3
37.092 2
24.492 2
24.493 2
24.494 7
47.938 1
24.497 1
37.097 8
37.099 2
33.824 7
51.111 15
59.025 2
62.553 2
62.554 2
57.867 2
33.826 2
62.555 1
33.827 5
62.556 2
33.828 1
59.028 1
46.429 11
51.117 1
75.158 2
27.621 1
27.623 1
27.624 2
37.5 113
37.6 2
32.313 8
27.626 3
37.7 3
32.314 1
67.797 3
27.628 2
32.316 2
37.9 1
61.044 1
43.81 5
32.317 8
32.318 2
43.82 4
32.319 2
43.83 2
37.551 3
61.048 1
48.993 6
29.43 2
EOD
# Histogram
BinWidth = 10
Bin(x) = floor(x/BinWidth)*BinWidth + BinWidth*0.5
# Mean
stats $Data u ($1*$2):2 nooutput
mean = STATS_sum_x/STATS_sum_y
set arrow 1 from mean, graph 0 to mean, graph 1 nohead lw 2 lc rgb "red" front
set label 1 sprintf("Mean: %.1f", mean) at mean, graph 1 offset 1,-0.7
set xlabel "Identity / %"
set xrange [0:100]
set xtics 10 out
set ylabel "The number of blast hits"
set style fill solid 0.3
set boxwidth BinWidth
set key noautotitle
set grid x,y
plot $Data using (Bin($1)):2 smooth freq with boxes lc "blue"
### end of code