In some of my files, date/time is encoded in a character string, e.g. "subject_name/20221006085330_test_something". The following function returns the date (in this case "20221006"):
get_date(c) = substr( strcol(c), strstrt( strcol(c), "/" ) + 1, strstrt( strcol(c), "/" ) + 8 )
How can this function be used together with
stats $DAT i selected_block(1) u ( timecolumn( 2, timeFmt ) )
Replacing 2 with get_date("string") does not work, and I cannot explain the meaning of this error: unknown type in magnitude().
# Test script:
reset session
$DAT<<EOD
# cross
string date
subject001_name/20200203144309_session021_id 20200203
subject002_name/20200522103745_session079_id 20200522
subject003_name/20210521172330_session201_id 20210521
subject004_name/20220527130250_session047_id 20220527
subject005_name/20220701171757_session081_id 20220701
subject006_name/20220706102238_session003_id 20220706
subject007_name/20221006110417_session081_id 20221006
subject008_name/20230309144707_session901_id 20230309
subject009_name/20230907080602_session003_id 20230907
subject010_name/20240919143733_session022_id 20240919
EOD
get_date(c) = substr( strcol(c), strstrt( strcol(c), "/" ) + 1, strstrt( strcol(c), "/" ) + 8 )
blocks = "cross"
selected_block(i) = word( blocks, i )
timeFmt = "%Y%m%d"
stats $DAT i selected_block(1) u ( timecolumn( 2, timeFmt ) )
pause -1
stats $DAT i selected_block(1) u ( timecolumn( get_date( "string" ), timeFmt ) )
Edit: The following adapted script works, but the statistics show 11 instead of 10 records.
# Test script:
reset session
$DAT<<EOD
# cross
string date
subject001_name/20200203144309_session021_id 20200203
subject002_name/20200522103745_session079_id 20200522
subject003_name/20210521172330_session201_id 20210521
subject004_name/20220527130250_session047_id 20220527
subject005_name/20220701171757_session081_id 20220701
subject006_name/20220706102238_session003_id 20220706
subject007_name/20221006110417_session081_id 20221006
subject008_name/20230309144707_session901_id 20230309
subject009_name/20230907080602_session003_id 20230907
subject010_name/20240919143733_session022_id 20240919
EOD
get_date(c) = substr( strcol(c), strstrt( strcol(c), "/" ) + 1, strstrt( strcol(c), "/" ) + 8 )
blocks = "cross"
selected_block(i) = word( blocks, i )
timeFmt = "%Y%m%d"
date_column(col) = strptime( timeFmt, get_date(col) )
stats $DAT i selected_block(1) u ( date_column("string") )
Well, the short answer would be: use strptime()
instead.
The function timecolumn(col,fmt)
is explictely for the column number and the date/time format. You cannot insert a function to extract the date/time first. Use strptime()
instead (check help strptime
).
Script:
### extract date/time from a string for stats
reset session
$Data <<EOD
subject001_name/20221006085330_test_something 1 2 3
subject002_name/20231007090000_test_something 4 5 6
subject003_name/20241007101111_test_something 7 8 9
EOD
myFmt = "%Y%m%d%H%M%S"
get_datetime(col) = substr(_s=strcol(col), _t=strstrt(_s,"/")+1, _t+13)
t(col) = strptime(myFmt, get_datetime(col))
set table $DateTime
plot $Data u (get_datetime(1)) w table
unset table
print $DateTime
stats $Data u (t(1)) nooutput
print "Min: ", STATS_min, " = ", strftime("%Y-%m-%d %H:%M:%S",STATS_min)
print "Max: ", STATS_max, " = ", strftime("%Y-%m-%d %H:%M:%S",STATS_max)
### end of script
Result:
20221006085330
20231007090000
20241007101111
Min: 1665046410.0 = 2022-10-06 08:53:30
Max: 1728295871.0 = 2024-10-07 10:11:11
Addition 1:
If your substring before the date was constant (which is not in your case) you could have used timecolumn()
in the following way.
myFmt = "subject_name/%Y%m%d%H%M%S"
stats $Data u (timecolumn(1,myFmt)) nooutput
Addition 2:
Yet, another method using timecolumn()
with variable pre-string. Before stats
set the datafile separator to /
and consider column 2 as the date/time column. However, this again implies that you have the same number of slashes /
before the date.
set datafile separator "/"
myFmt = "%Y%m%d%H%M%S"
stats $Data u (timecolumn(2,myFmt)) nooutput
set datafile separator whitespace
Addition 3:
Here is a version with addressing columns by name, but avoiding that the header "string"
is (mis)interpreted as a number (something like 6.91e-317
, in fact 0
) which would result in the date 1970-01-01 00:00:00
. You need to insert a condition: if pseudocolumn 0 (i.e. $0
line number, zero-based) is smaller than 1
then return NaN
which will not be taken into account for stats
.
Script:
### extract date/time from a string for stats
reset session
$Data <<EOD
# cross
string date
subject001_name/20200203144309_session021_id 20200203
subject002_name/20200522103745_session079_id 20200522
subject003_name/20210521172330_session201_id 20210521
subject004_name/20220527130250_session047_id 20220527
subject005_name/20220701171757_session081_id 20220701
subject006_name/20220706102238_session003_id 20220706
subject007_name/20221006110417_session081_id 20221006
subject008_name/20230309144707_session901_id 20230309
subject009_name/20230907080602_session003_id 20230907
subject010_name/20240919143733_session022_id 20240919
EOD
myFmt = "%Y%m%d%H%M%S"
get_datetime(col) = ($0<1 ? NaN : \
strptime(myFmt, substr(_s=strcol(col), _t=strstrt(_s,"/")+1, _t+13)))
stats $Data u (get_datetime("string")) nooutput
print "Records: ", STATS_records
print "Min: ", STATS_min, " = ", strftime("%Y-%m-%d %H:%M:%S",STATS_min)
print "Max: ", STATS_max, " = ", strftime("%Y-%m-%d %H:%M:%S",STATS_max)
### end of script
Result:
Records: 10
Min: 1580740989.0 = 2020-02-03 14:43:09
Max: 1726756653.0 = 2024-09-19 14:37:33