gnuplot

gnuplot: How do I extract a date from a string to use it with timecolumn?


In some of my files, date/time is encoded in a character string, e.g. "subject_name/20221006085330_test_something". The following function returns the date (in this case "20221006"):

get_date(c) = substr( strcol(c), strstrt( strcol(c), "/" ) + 1, strstrt( strcol(c), "/" ) + 8 )

How can this function be used together with

stats $DAT i selected_block(1) u ( timecolumn( 2, timeFmt ) ) 

Replacing 2 with get_date("string") does not work, and I cannot explain the meaning of this error: unknown type in magnitude().

# Test script:
reset session

$DAT<<EOD
# cross
string    date     
subject001_name/20200203144309_session021_id     20200203
subject002_name/20200522103745_session079_id     20200522
subject003_name/20210521172330_session201_id     20210521
subject004_name/20220527130250_session047_id     20220527
subject005_name/20220701171757_session081_id     20220701
subject006_name/20220706102238_session003_id     20220706
subject007_name/20221006110417_session081_id     20221006
subject008_name/20230309144707_session901_id     20230309
subject009_name/20230907080602_session003_id     20230907
subject010_name/20240919143733_session022_id     20240919 
EOD

get_date(c) = substr( strcol(c), strstrt( strcol(c), "/" ) + 1, strstrt( strcol(c), "/" ) + 8 )

blocks              = "cross"
selected_block(i)   = word( blocks, i )

timeFmt             = "%Y%m%d"

stats $DAT i selected_block(1) u ( timecolumn( 2, timeFmt ) ) 

pause -1 

stats $DAT i selected_block(1) u ( timecolumn( get_date( "string" ), timeFmt ) ) 

Edit: The following adapted script works, but the statistics show 11 instead of 10 records.

# Test script:
reset session

$DAT<<EOD
# cross
string    date     
subject001_name/20200203144309_session021_id     20200203 
subject002_name/20200522103745_session079_id     20200522 
subject003_name/20210521172330_session201_id     20210521 
subject004_name/20220527130250_session047_id     20220527 
subject005_name/20220701171757_session081_id     20220701 
subject006_name/20220706102238_session003_id     20220706 
subject007_name/20221006110417_session081_id     20221006 
subject008_name/20230309144707_session901_id     20230309 
subject009_name/20230907080602_session003_id     20230907 
subject010_name/20240919143733_session022_id     20240919 
EOD

get_date(c) = substr( strcol(c), strstrt( strcol(c), "/" ) + 1, strstrt( strcol(c), "/" ) + 8 )

blocks              = "cross"
selected_block(i)   = word( blocks, i )

timeFmt             = "%Y%m%d"
date_column(col)    = strptime( timeFmt, get_date(col) )

stats $DAT i selected_block(1) u ( date_column("string") ) 

Solution

  • Well, the short answer would be: use strptime() instead.

    The function timecolumn(col,fmt) is explictely for the column number and the date/time format. You cannot insert a function to extract the date/time first. Use strptime() instead (check help strptime).

    Script:

    ### extract date/time from a string for stats
    reset session
    
    $Data <<EOD
    subject001_name/20221006085330_test_something    1   2   3
    subject002_name/20231007090000_test_something    4   5   6
    subject003_name/20241007101111_test_something    7   8   9
    EOD
    
    myFmt             = "%Y%m%d%H%M%S"
    get_datetime(col) = substr(_s=strcol(col), _t=strstrt(_s,"/")+1, _t+13)
    t(col)            = strptime(myFmt, get_datetime(col))
    
    set table $DateTime
        plot $Data u (get_datetime(1)) w table
    unset table
    print $DateTime
    
    stats $Data u (t(1)) nooutput
    print "Min: ", STATS_min, "  =  ", strftime("%Y-%m-%d %H:%M:%S",STATS_min)
    print "Max: ", STATS_max, "  =  ", strftime("%Y-%m-%d %H:%M:%S",STATS_max)
    ### end of script
    

    Result:

    20221006085330
    20231007090000
    20241007101111
    
    Min: 1665046410.0  =  2022-10-06 08:53:30
    Max: 1728295871.0  =  2024-10-07 10:11:11
    

    Addition 1:

    If your substring before the date was constant (which is not in your case) you could have used timecolumn() in the following way.

    myFmt = "subject_name/%Y%m%d%H%M%S"
    stats $Data u (timecolumn(1,myFmt)) nooutput
    

    Addition 2:

    Yet, another method using timecolumn() with variable pre-string. Before stats set the datafile separator to / and consider column 2 as the date/time column. However, this again implies that you have the same number of slashes / before the date.

    set datafile separator "/"
    myFmt = "%Y%m%d%H%M%S"
    stats $Data u (timecolumn(2,myFmt)) nooutput
    set datafile separator whitespace
    

    Addition 3:

    Here is a version with addressing columns by name, but avoiding that the header "string" is (mis)interpreted as a number (something like 6.91e-317, in fact 0) which would result in the date 1970-01-01 00:00:00. You need to insert a condition: if pseudocolumn 0 (i.e. $0 line number, zero-based) is smaller than 1 then return NaN which will not be taken into account for stats.

    Script:

    ### extract date/time from a string for stats
    reset session
    
    $Data <<EOD
    # cross
    string    date
    subject001_name/20200203144309_session021_id     20200203
    subject002_name/20200522103745_session079_id     20200522
    subject003_name/20210521172330_session201_id     20210521
    subject004_name/20220527130250_session047_id     20220527
    subject005_name/20220701171757_session081_id     20220701
    subject006_name/20220706102238_session003_id     20220706
    subject007_name/20221006110417_session081_id     20221006
    subject008_name/20230309144707_session901_id     20230309
    subject009_name/20230907080602_session003_id     20230907
    subject010_name/20240919143733_session022_id     20240919
    EOD
    
    myFmt             = "%Y%m%d%H%M%S"
    get_datetime(col) = ($0<1 ? NaN : \
        strptime(myFmt, substr(_s=strcol(col), _t=strstrt(_s,"/")+1, _t+13)))
    
    stats $Data u (get_datetime("string")) nooutput
    print "Records: ", STATS_records
    print "Min: ", STATS_min, "  =  ", strftime("%Y-%m-%d %H:%M:%S",STATS_min)
    print "Max: ", STATS_max, "  =  ", strftime("%Y-%m-%d %H:%M:%S",STATS_max)
    ### end of script
    

    Result:

    Records: 10
    Min: 1580740989.0  =  2020-02-03 14:43:09
    Max: 1726756653.0  =  2024-09-19 14:37:33