As per this, to_timestamp converts it to local machine's timezone.
My machine timezone is UTC.
When running this :
select
to_timestamp("2024-09-20 19-00-00+0530","yyyy-MM-dd HH-mm-ssZ") as ist_offset,
to_timestamp("2024-09-20 19-00-00IST","yyyy-MM-dd HH-mm-ssz") as ist_abbreviation
I'm getting this output :
ist_offset | ist_abbreviation |
---|---|
2024-09-20 13:30:00.000 | 2024-09-20 17:00:00.000 |
For ist_offset
it generated correct timezone, but for ist_abbreviation
it did some conversion but not sure to which timezone and why it's giving different ?
I looked online at pyspark documentation and different blogs as well, but unable to find anything. Can someone help me how to do timezone conversion with abbreviation as timezone as I'm restricted not to use offset or proper timezone like "Asia/Kolkata".
I feel there might be ambiguity for IST which could be either Indian Standard Time (UTC+5:30) , Irish Standard Time (UTC+01:00), or Israel Standard Time (UTC+02:00)
You either need to add offset or the specify timezone
Maybe a udf like this to be specific -
def parse_timestamp_with_ist(timestamp_str):
if 'IST' in timestamp_str:
dt = datetime.strptime(timestamp_str, "%Y-%m-%d %H-%M-%SIST")
return dt - timedelta(hours=5, minutes=30)
else:
return datetime.strptime(timestamp_str, "%Y-%m-%d %H-%M-%S")