gnuplottime-seriescategorical-data

gnuplot timeseries with categorical states as step function


I have a datafile which looks something like this:

0;State a
1;State a
2;State b
3:State b
4:State a

Where the first column represents the time in seconds, and the second column represents a certain state.

I want to plot the occurences of the events in gnuplot over time. I am trying to use the following for plotting:

set datafile separator ";"
plot 'data' using 1:2:yticlabels(2)

However I get the following error:

warning: Skipping data file with no valid points
                                       ^
x range is invalid

It seems like gnuplot won't recognize the strings as categorical values. The result should look something like a non-continous step function:

       ^
State b┼       ┌───────┐
       │       │       │
State a┼───────┘       └────
       │
       ┼───┼───┼───┼───┼───┼─>
       0   1   2   3   4   5           

Is this sort of plot possible with gnuplot? If, so how would you do this?


Solution

  • No, gnuplot doesn't recognize strings as categorical values. You must do those assignments "string → integer" yourself.

    The easiest way to do this mapping is to use an external tool like awk and add the integer values on-the-fly. The following awk call does this mapping and adds the values to the output:

    awk -F ';' -v OFS=';' '{
      if (!($2 in array)) { 
        array[$2] = length(array)
      }
      print $1,$2,array[$2]
    }' data.csv
    

    Using the gnuplot syntax

    plot "< awk ..."
    

    you can combine the awk call directly with the plotting:

    set datafile separator ";"
    set offset 0.1,0.1,0.1,0.1
    set xtics 0,1
    plot "< awk -F ';' -v OFS=';' '{if (!($2 in array)) { array[$2] = length(array) }; print $1,$2,array[$2]}' data.csv" using 1:3:ytic(2) w step lw 3 notitle
    

    The output is

    enter image description here

    Alternatively, if you haven't access to awk, you can do the preprocessing also using e.g. a python script like the following cat.py:

    from __future__ import print_function
    import sys
    a={}
    with open(sys.argv[1], 'r') as f:
        for line in f:
            fields = line.strip().split(';')
            if (not fields[1] in a):
                a[fields[1]] = len(a)
            print("{0};{1};{2}".format(fields[0], fields[1], a[fields[1]]), file=sys.stdout)
    

    and call it with

    plot "< python cat.py data.csv" ...
    

    Side note: maybe one could also do this using gnuplot only, but that can become quite ugly, see Gnuplot, plotting a graph with text on y axis for a similar use case.