I have this script to plot data from a CSV file using gnuplot. I want to add 3 vertical lines at different times on the plot to show where I changed the workload of my experiment. I was trying to do it with vector
but it was messing the data already plotted. I attached my chart and added manually the vertical blue line as an example of what I want.
#!/usr/bin/gnuplot
# set grid
set key under left maxrows 1
set style line 1 lc rgb '#E02F44' lt 1 lw 1 ps 0.5 pt 7 # input throughput
set style line 2 lc rgb '#FF780A' lt 1 lw 1 ps 0.5 pt 1 # output throughput
set style line 3 lc rgb '#56A64B' lt 1 lw 1 ps 0.5 pt 2 # average processing latency
set style line 4 lc rgb '#000000' lt 1 lw 1 ps 0.5 pt 3 # 99th percentile processing latency
set terminal pdf
set pointintervalbox 0
set datafile separator ','
set output "efficiency-throughput-networkbuffer-baseline-TaxiRideNYC-100Kpersec.pdf"
set title "Throughput vs. processing latency consuming 50K r/s from the New York City (TLC)"
set xlabel "time (minutes)"
set ylabel "Throughput (K rec/sec)"
set y2label "processing latency (seconds)"
set ytics nomirror
set y2tics 0, 1
set xdata time # tells gnuplot the x axis is time data
set timefmt "%Y-%m-%d %H:%M:%S" # specify our time string format
set format x "%M" # otherwise it will show only MM:SS
plot "throughput-latency-increasing.csv" using 1:(column(2)/1000) title "IN throughput" with linespoints ls 1 axis x1y1 \
, "throughput-latency-increasing.csv" using 1:(column(10)/1000) title "OUT throughput" with linespoints ls 2 axis x1y1 \
, "throughput-latency-increasing.csv" using 1:(column(18)/1000) title "avg. latency" with linespoints ls 3 axis x1y2 \
, "throughput-latency-increasing.csv" using 1:(column(26)/1000) title "99th perc. latency" with linespoints ls 4 axis x1y2 \
#, "" using 1:($1):(3):(0) notitle with vectors nohead
My data file is:
"Time","pre_aggregate[0]-IN","pre_aggregate[1]-IN","pre_aggregate[2]-IN","pre_aggregate[3]-IN","pre_aggregate[4]-IN","pre_aggregate[5]-IN","pre_aggregate[6]-IN","pre_aggregate[7]-IN","pre_aggregate[0]-OUT","pre_aggregate[1]-OUT","pre_aggregate[2]-OUT","pre_aggregate[3]-OUT","pre_aggregate[4]-OUT","pre_aggregate[5]-OUT","pre_aggregate[6]-OUT","pre_aggregate[7]-OUT","pre_aggregate[0]-50","pre_aggregate[1]-50","pre_aggregate[2]-50","pre_aggregate[3]-50","pre_aggregate[4]-50","pre_aggregate[5]-50","pre_aggregate[6]-50","pre_aggregate[7]-50","pre_aggregate[0]-99","pre_aggregate[1]-99","pre_aggregate[2]-99","pre_aggregate[3]-99","pre_aggregate[4]-99","pre_aggregate[5]-99","pre_aggregate[6]-99","pre_aggregate[7]-99"
"2020-04-27 10:31:00",1428.05,1274.4666666666667,1364.6166666666666,1384.4666666666667,1327.3,1376.5,1390.9166666666667,1418.35,1428.05,1274.4666666666667,1364.6333333333334,1384.4666666666667,1327.3,1376.5,1390.9166666666667,1418.35,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1
"2020-04-27 10:31:15",1463.5833333333333,1452.3666666666666,1346.7333333333333,1380.3833333333334,1429.4833333333333,1431.6833333333334,1442.85,1425.15,1463.5833333333333,1452.3666666666666,1346.7333333333333,1380.3833333333334,1429.4833333333333,1431.6833333333334,1442.85,1425.15,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1
"2020-04-27 10:31:30",1393.4666666666667,1396.65,1369.55,1381.3833333333334,1336.8,1434.5166666666667,1440.0833333333333,1399.2833333333333,1393.45,1396.65,1369.55,1381.3833333333334,1336.8,1434.5166666666667,1440.0833333333333,1399.2833333333333,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1
"2020-04-27 10:31:45",1404.8833333333334,1448.5333333333333,1313.9,1308.1,1359.6333333333334,1329.5166666666667,1338.4166666666667,1481.5666666666666,1404.8833333333334,1448.5333333333333,1313.9,1308.1,1359.6333333333334,1329.5166666666667,1338.4166666666667,1481.5833333333333,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1
Of course you can plot your lines and labels. In the example below I'm using the newer syntax compared to set xdata time
. Which requires timecolumn(1,myTimeFmt)
and e.g. set format x "%M" timedate
.
Your date is in double quotes, so you have to define the timeformat using single quotes including the double quotes.
Furthermore, you are using absolute times, so your lines ideally use the same format. You can put it into a datablock. The script below uses some random data for illustration. I hope you can adapt the script to your needs.
Edit: variable labels added in column 2
Script:
### vertical lines with labels on time axis
reset session
$myLines <<EOD
"2020-04-27 10:34:00" "Workload\nchanged"
"2020-04-27 10:39:20" "Something else\nhappened"
"2020-04-27 10:43:50" "Unkown\nevent"
"2020-04-27 10:48:00" "Workload\nchanged"
EOD
myTimeFmt = '"%Y-%m-%d %H:%M:%S"'
StartDate = '"2020-04-27 10:30:00"'
EndDate = '"2020-04-27 10:52:00"'
set format x "%M" time
set xrange [strptime(myTimeFmt,StartDate):strptime(myTimeFmt,EndDate)]
set tmargin screen 0.90
set key noautotitle
yLow = 1.4
yHigh = 3.5
plot '+' u (strptime(myTimeFmt,StartDate)+$0*60):(rand(0)*3+0.5) w l lc rgb "red", \
$myLines u (timecolumn(1,myTimeFmt)):(yHigh):2 w labels right offset -0.5,1.5, \
$myLines u (timecolumn(1,myTimeFmt)):(yLow):(0):(yHigh-yLow) w vec lc rgb "blue" lw 2 nohead
### end of script
Result: