ctimezone-offsetstrptimemktime

Parsing datetime string with UTC offset to time_t


This question is hinted at in this one, but the answer to that question doesn't answer this question at all, and I've conflicting suggestions and hints scattered around.

My problem is relatively simple, but in digging into it, I'm getting a bit tripped up.

Suppose I have a string in a format like this: 2023-06-07 03:04:56 -0700

The goal is to normalize this into an epoch timestamp (time_t in C). I assumed this would be simple enough, but it seems not. The gotcha here seems to be the -0700 at the end.

It seems that strptime(3) ignores the %z modified, possibly, maybe (again, I've conflicting reports as to how this is used, in different implementations, etc.). FWIW, I'm using Linux/glibc so I more care about whether it works there, not that it's not in the C standard.

Playing around with it a little bit, it seemed to me like strptime does ignore the timezone offset. The hour in the struct tm is simply the hour in the string. The hour isn't modified based on the timezone offset at all. Supposedly that's what the non-standard tm_gmoff member is for, but I seem to just get a gigantic value when reading that that is definitely much larger than any UTC offset in seconds, so I'm not sure what to make of that either.

As an example:

#define _XOPEN_SOURCE

#include <stdio.h>
#include <string.h>
#include <time.h>

int main()
{
    struct tm tm;
    time_t epoch;
    char buf[40];

    strcpy(buf, "2023-06-07 03:04:56 -0700");
    memset(&tm, 0, sizeof(tm));

    strptime(buf, "%Y-%m-%d %H:%M:%S %z", &tm);
    printf("Parsed datetime %s (hour %d, offset %lu)\n", buf, tm.tm_hour, tm.__tm_gmtoff);
    tm.tm_isdst = -1;
    setenv("TZ", "US/Eastern");
    epoch = mktime(&tm);
    printf("Parsed datetime -> epoch %lu\n", epoch); // 7:04AM UTC
    epoch = timegm(&tm);
    printf("Parsed datetime -> epoch %lu\n", epoch); // 3:04AM UTC
    return 0;
}

when run on https://www.onlinegdb.com/online_c_compiler, gives:

Parsed datetime 2023-06-07 03:04:56 -0700 (hour 3, offset 18446744073709526416)
Parsed datetime -> epoch 1686121496
Parsed datetime -> epoch 1686107096

Note that -0700 offset in the string is arbitrary, and the local time zone on the system is also arbitrary. For example, -0700 is Pacific Time, but the system could be in Eastern Time, which is actually completely irrelevant to the problem (i.e. the local time zone should not be used in the conversion, since it's irrelevant - the time zone of the offset should be used instead - and importantly, the local time zone should not mess up the answer).

Above, the correct answer is 10:04AM UTC (what the string obviously should convert to). Blindly using mktime gives the wrong answer, and timegm is even more off. The problem seems to be that the offset is not taken into account here. The second answer using timegm would be correct, if the struct tm had +7 hours added to it for the offset, or if timegm added +7 hours to the answer based on something in the struct tm, such as tm_gmtoff. But neither of those things seems to happen.

Short of writing a manual function to parse the %z in the time string and manually add this offset to the time_t, is there a better "builtin" way of doing this with standard functions? (Portability isn't super important here, as long as it works in glibc.) Given this would seem to be a very common type of conversion, I'm thinking there must be a way to do this properly without manually doing calculations, using gmtime. I thought this was what tm_gmtoff was for but it seems otherwise - am I missing something here?


Solution

  • A few issues ...

    1. The __tm_gmtoff is signed [so it printed incorrectly] with %lu
    2. __tm_gmtoff is set correctly (e.g. -7 * 3600).
    3. Doing setenv("TZ",...) does not work. It uses the local timezone set by the system. (e.g. -0700 is US/Pacific(?) DST but I got -0400 (US/Eastern DST).
    4. timegm will ignore __tm_gmtoff
    5. On linux/glibc, the symbol is tm_gmtoff [AFAICT].
    6. Better to use timegm and apply tm_gmtoff manually to get the correct timezone.

    Here is the somewhat corrected code (in stages). It may still be broken. Important to read the comments:

    //#define _XOPEN_SOURCE
    #define _GNU_SOURCE
    
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    #include <time.h>
    
    void
    sepline(const char *tag)
    {
    
        printf("\n");
        for (int col = 1;  col <= 80;  ++col)
            putchar('-');
        printf("\n");
        printf("%s:\n",tag);
        printf("\n");
    }
    
    void
    tmshow(const struct tm *tm,const char *tag)
    {
    
        printf("TMX: %4.4d/%2.2d/%2.2d-%2.2d:%2.2d:%2.2d (%ld/%ld) (from %s)\n",
            tm->tm_year + 1900,tm->tm_mon + 1,tm->tm_mday,
            tm->tm_hour,tm->tm_min,tm->tm_sec,tm->tm_gmtoff,tm->tm_gmtoff / 3600,
            tag);
    }
    
    void
    todshow(time_t tod,int gmtflg,const char *tag)
    {
        struct tm tm;
    
        if (gmtflg)
            gmtime_r(&tod,&tm);
        else
            localtime_r(&tod,&tm);
    
        printf("\n");
        printf("TOD: %ld (from %s)\n",tod,tag);
        tmshow(&tm,tag);
    }
    
    void
    orig(const char *buf)
    {
        struct tm tm;
        memset(&tm, 0, sizeof(tm));
    
        sepline("ORIG");
    
        printf("BUF: %s\n",buf);
        strptime(buf, "%Y-%m-%d %H:%M:%S %z", &tm);
        printf("Parsed datetime %s (hour %d, offset %lu)\n",
            buf, tm.tm_hour, tm.tm_gmtoff);
    
        tm.tm_isdst = -1;
        setenv("TZ", "US/Eastern", 1);
        tmshow(&tm,"strptime");
    
        time_t epoch_mktime = mktime(&tm);
        printf("Parsed mktime -> epoch %lu\n", epoch_mktime);   // 7:04AM UTC
    
        time_t epoch_timegm = timegm(&tm);
        printf("Parsed timegm -> epoch %lu\n", epoch_timegm);   // 3:04AM UTC
    
        time_t diff = epoch_mktime - epoch_timegm;
        printf("diff = %ld (%.3f)\n",diff,diff / 3600.0);
    }
    
    void
    fix1(const char *buf)
    {
        struct tm tm;
        memset(&tm, 0, sizeof(tm));
    
        sepline("FIX1");
    
        printf("BUF: %s\n",buf);
        strptime(buf, "%Y-%m-%d %H:%M:%S %z", &tm);
    #if 0
        printf("Parsed datetime %s (hour %d, offset %ld/%ld)\n",
            buf, tm.tm_hour, tm.tm_gmtoff, tm.tm_gmtoff / 3600);
    #endif
    
        tm.tm_isdst = -1;
        //setenv("TZ", "US/Eastern", 1);
        unsetenv("TZ");
        tmshow(&tm,"strptime");
    
        time_t epoch_mktime = mktime(&tm);
        //printf("Parsed mktime -> epoch %lu\n", epoch_mktime); // 7:04AM UTC
        todshow(epoch_mktime,0,"mktime");
    
        time_t epoch_timegm = timegm(&tm);
        //printf("Parsed timegm -> epoch %lu\n", epoch_timegm); // 3:04AM UTC
        todshow(epoch_timegm,1,"timegm");
    
        time_t diff = epoch_mktime - epoch_timegm;
        printf("diff = %ld (%.3f)\n",diff,diff / 3600.0);
    }
    
    void
    fix2(const char *buf)
    {
        struct tm tm;
        memset(&tm, 0, sizeof(tm));
    
        sepline("FIX2");
    
        printf("BUF: %s\n",buf);
        strptime(buf, "%Y-%m-%d %H:%M:%S %z", &tm);
        tmshow(&tm,"strptime");
    
        // NOTE: timegm ignores this -- so remember it
        time_t offset = tm.tm_gmtoff;
    
        //tm.tm_gmtoff = 0;
        time_t epoch_timegm = timegm(&tm);
        todshow(epoch_timegm,1,"timegm");
    
        // adjust for timezone -- this produces correct GMT
        todshow(epoch_timegm - offset,1,"timegm+offset");
    
    // NOTE/BUG: setting TZ does _not_ work
    #if 0
        time_t epoch_mktime = epoch_timegm;
        epoch_mktime -= offset;
        setenv("TZ", "US/Pacific", 1);
        localtime_r(&epoch_mktime,&tm);
    #endif
    #if 1
        time_t epoch_mktime = epoch_timegm;
        //epoch_mktime += offset;
        //epoch_mktime += offset;
        gmtime_r(&epoch_mktime,&tm);
        tm.tm_gmtoff += offset;
        //tm.tm_gmtoff += offset;
    #endif
    
        //printf("Parsed mktime -> epoch %lu\n", epoch_mktime); // 7:04AM UTC
        tmshow(&tm,"localtime_r");
    
        time_t diff = epoch_mktime - epoch_timegm;
        printf("diff = %ld (%.3f)\n",diff,diff / 3600.0);
    }
    
    int
    main()
    {
        char buf[40];
    
        // Pacific time???
        strcpy(buf, "2023-06-07 03:04:56 -0700");
    
        orig(buf);
        fix1(buf);
        fix2(buf);
    
        return 0;
    }
    

    Here is the program output:

    
    --------------------------------------------------------------------------------
    ORIG:
    
    BUF: 2023-06-07 03:04:56 -0700
    Parsed datetime 2023-06-07 03:04:56 -0700 (hour 3, offset 18446744073709526416)
    TMX: 2023/06/07-03:04:56 (-25200/-7) (from strptime)
    Parsed mktime -> epoch 1686121496
    Parsed timegm -> epoch 1686107096
    diff = 14400 (4.000)
    
    --------------------------------------------------------------------------------
    FIX1:
    
    BUF: 2023-06-07 03:04:56 -0700
    TMX: 2023/06/07-03:04:56 (-25200/-7) (from strptime)
    
    TOD: 1686121496 (from mktime)
    TMX: 2023/06/07-03:04:56 (-14400/-4) (from mktime)
    
    TOD: 1686107096 (from timegm)
    TMX: 2023/06/07-03:04:56 (0/0) (from timegm)
    diff = 14400 (4.000)
    
    --------------------------------------------------------------------------------
    FIX2:
    
    BUF: 2023-06-07 03:04:56 -0700
    TMX: 2023/06/07-03:04:56 (-25200/-7) (from strptime)
    
    TOD: 1686107096 (from timegm)
    TMX: 2023/06/07-03:04:56 (0/0) (from timegm)
    
    TOD: 1686132296 (from timegm+offset)
    TMX: 2023/06/07-10:04:56 (0/0) (from timegm+offset)
    TMX: 2023/06/07-03:04:56 (-25200/-7) (from localtime_r)
    diff = 0 (0.000)