NOTE: There are a few other questions that are very similar to this and which deal with the same kind of issues, but I really felt they did not articulate it well enough for me. So I am asking this question with hopefully a pretty clear explanation and also pretty clear visual examples in the hope that it resonates with as many Postgres newbies, and solicits the best explanations and solutions, as possible.
I have CSV file with a text field that represents a timestamp taken from within a specific timezone, but the timezone is not included/represented - you just have to know that the timestamps are unique to one particular TZ, and it is not the same TZ as the system TZ on the system hosting PG.
Example:
Let's say my system is in the TZ +02:00
, and the CSV contains timestamps from TZ GMT
(or +00:00
, for the sake of simplicity).
Let's also say the TEXT STRING which represents the timestamp is formatted as follows:
DD.MM.YYYY HH:mm:ss.uuu
where uuu
means milliseconds, or a precision of 3-decimal places.
I want to import or INSERT all rows from the CSV into my TABLE, but I need to:
DD.MM.YYYY HH:mm:ss.uuu
,That is, the table itself must represent GMT timestamps as read from the source CSV TEXT.
So, the following result of the INSERT action would be incorrect:
However, this would be the correct result of the INSERT statement and the timestamp TZ interpretation:
There is a lot of information about timestamps and timezones in the Postgres documentation, but most of it seems to be about how to format/display timestamp output when running SELECT queries on a table with timestamp data stored in a column.
What I am asking about is fundamentally different: it is how to tell Postgres how to read the format of a TEXT string being INSERTed into a table and to convert it to the correct timestamp in the correct timezone...as opposed to how to display an internal timestamp in the format you want to see it in.
So my questions are:
Edit: adding code examples.
I have used this syntax, which is accepted and works:
INSERT INTO myTable (datetime,sys_id,cputil,memfree,sessnum) VALUES ('2019/05/03 09:41:14 +00:00',100,0.55,0.59,63) RETURNING *;
and also this syntax, which was also recognized and worked (with daylight savings, even!):
INSERT INTO myTable (datetime,sys_id,cputil,memfree,sessnum) VALUES ('23.12.2020 13:51:35 +00:00',100,0.76,0.22,71) RETURNING *;
but my main concern is that date formats can be very ambiguous and I don't want to assume Postgres will always 'just get it right'. As everyone who has spent 1 minute pondering dates knows, the first 12 days of every month can be misread! I would much rather force the correct format interpretation and know there will not be any rows INSERTed with the wrong date because the date and month were swapped, or where the hour is 12 hours out of phase due to not assuming a 24hr time.
I want deterministic timestamp interpretation.
If you want to store UTC timestamps in a timestamp with time zone
, tell PostgreSQL that your data are in UTC. And you better use an explicit format to convert the string as well:
INSERT INTO tab (tstzcol)
VALUES (to_timestamp(
/* your string goes here */ || '+00',
'DD.MM.YYYY HH24:MI:SS.FF3 TZH'
));
The +00
is a time zone offset from UTC that is parsed by the TZH
format. So essentially you are saying “this timestamp is offset 0 hours from UTC”, which means that it is an UTC timestamp.