gnuplot

How to add vertical lines with label using gnuplot?


I have this script to plot data from a CSV file using gnuplot. I want to add 3 vertical lines at different times on the plot to show where I changed the workload of my experiment. I was trying to do it with vector but it was messing the data already plotted. I attached my chart and added manually the vertical blue line as an example of what I want.

#!/usr/bin/gnuplot

# set grid
set key under left maxrows 1
set style line 1 lc rgb '#E02F44' lt 1 lw 1 ps 0.5 pt 7 # input throughput
set style line 2 lc rgb '#FF780A' lt 1 lw 1 ps 0.5 pt 1 # output throughput
set style line 3 lc rgb '#56A64B' lt 1 lw 1 ps 0.5 pt 2 # average processing latency
set style line 4 lc rgb '#000000' lt 1 lw 1 ps 0.5 pt 3 # 99th percentile processing latency

set terminal pdf
set pointintervalbox 0
set datafile separator ','
set output "efficiency-throughput-networkbuffer-baseline-TaxiRideNYC-100Kpersec.pdf"
set title "Throughput vs. processing latency consuming 50K r/s from the New York City (TLC)"
set xlabel "time (minutes)"
set ylabel "Throughput (K rec/sec)"
set y2label "processing latency (seconds)"
set ytics nomirror
set y2tics 0, 1
set xdata time # tells gnuplot the x axis is time data
set timefmt "%Y-%m-%d %H:%M:%S" # specify our time string format
set format x "%M" # otherwise it will show only MM:SS
plot "throughput-latency-increasing.csv" using 1:(column(2)/1000) title "IN throughput" with linespoints ls 1 axis x1y1 \
, "throughput-latency-increasing.csv" using 1:(column(10)/1000) title "OUT throughput" with linespoints ls 2 axis x1y1 \
, "throughput-latency-increasing.csv" using 1:(column(18)/1000) title "avg. latency" with linespoints ls 3 axis x1y2 \
, "throughput-latency-increasing.csv" using 1:(column(26)/1000) title "99th perc. latency" with linespoints ls 4 axis x1y2 \
#, "" using 1:($1):(3):(0) notitle with vectors nohead

My data file is:

"Time","pre_aggregate[0]-IN","pre_aggregate[1]-IN","pre_aggregate[2]-IN","pre_aggregate[3]-IN","pre_aggregate[4]-IN","pre_aggregate[5]-IN","pre_aggregate[6]-IN","pre_aggregate[7]-IN","pre_aggregate[0]-OUT","pre_aggregate[1]-OUT","pre_aggregate[2]-OUT","pre_aggregate[3]-OUT","pre_aggregate[4]-OUT","pre_aggregate[5]-OUT","pre_aggregate[6]-OUT","pre_aggregate[7]-OUT","pre_aggregate[0]-50","pre_aggregate[1]-50","pre_aggregate[2]-50","pre_aggregate[3]-50","pre_aggregate[4]-50","pre_aggregate[5]-50","pre_aggregate[6]-50","pre_aggregate[7]-50","pre_aggregate[0]-99","pre_aggregate[1]-99","pre_aggregate[2]-99","pre_aggregate[3]-99","pre_aggregate[4]-99","pre_aggregate[5]-99","pre_aggregate[6]-99","pre_aggregate[7]-99"
"2020-04-27 10:31:00",1428.05,1274.4666666666667,1364.6166666666666,1384.4666666666667,1327.3,1376.5,1390.9166666666667,1418.35,1428.05,1274.4666666666667,1364.6333333333334,1384.4666666666667,1327.3,1376.5,1390.9166666666667,1418.35,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1
"2020-04-27 10:31:15",1463.5833333333333,1452.3666666666666,1346.7333333333333,1380.3833333333334,1429.4833333333333,1431.6833333333334,1442.85,1425.15,1463.5833333333333,1452.3666666666666,1346.7333333333333,1380.3833333333334,1429.4833333333333,1431.6833333333334,1442.85,1425.15,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1
"2020-04-27 10:31:30",1393.4666666666667,1396.65,1369.55,1381.3833333333334,1336.8,1434.5166666666667,1440.0833333333333,1399.2833333333333,1393.45,1396.65,1369.55,1381.3833333333334,1336.8,1434.5166666666667,1440.0833333333333,1399.2833333333333,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1
"2020-04-27 10:31:45",1404.8833333333334,1448.5333333333333,1313.9,1308.1,1359.6333333333334,1329.5166666666667,1338.4166666666667,1481.5666666666666,1404.8833333333334,1448.5333333333333,1313.9,1308.1,1359.6333333333334,1329.5166666666667,1338.4166666666667,1481.5833333333333,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1

enter image description here


Solution

  • Of course you can plot your lines and labels. In the example below I'm using the newer syntax compared to set xdata time. Which requires timecolumn(1,myTimeFmt) and e.g. set format x "%M" timedate. Your date is in double quotes, so you have to define the timeformat using single quotes including the double quotes. Furthermore, you are using absolute times, so your lines ideally use the same format. You can put it into a datablock. The script below uses some random data for illustration. I hope you can adapt the script to your needs.

    Edit: variable labels added in column 2

    Script:

    ### vertical lines with labels on time axis
    reset session
    
    $myLines <<EOD
    "2020-04-27 10:34:00" "Workload\nchanged"
    "2020-04-27 10:39:20" "Something else\nhappened"
    "2020-04-27 10:43:50" "Unkown\nevent"
    "2020-04-27 10:48:00" "Workload\nchanged"
    EOD
    
    myTimeFmt = '"%Y-%m-%d %H:%M:%S"'
    StartDate = '"2020-04-27 10:30:00"'
    EndDate   = '"2020-04-27 10:52:00"'
    
    set format x "%M" time
    set xrange [strptime(myTimeFmt,StartDate):strptime(myTimeFmt,EndDate)]
    set  tmargin screen 0.90
    set key noautotitle
    
    yLow  = 1.4
    yHigh = 3.5
    
    plot '+' u (strptime(myTimeFmt,StartDate)+$0*60):(rand(0)*3+0.5) w l lc rgb "red", \
         $myLines u (timecolumn(1,myTimeFmt)):(yHigh):2 w labels right offset -0.5,1.5, \
         $myLines u (timecolumn(1,myTimeFmt)):(yLow):(0):(yHigh-yLow) w vec lc rgb "blue" lw 2 nohead
    ### end of script
    

    Result:

    enter image description here