I'm trying to convert a string value, in the format "30 May 2024" to a date, in Spark SQL, in Microsoft Fabric.
The initial column, called TITLE, has text in the following format:
"SOME TEXT - 30th May 2024 11:26" "SOME TEXT - 3rd Dec 2022 14:38"
I've successfully got it to the point that it looks like this:
"30 May 2024" "3 Dec 2022"
And now I need to convert it to a date format.
I'm trying to devise the logic using the code below, and have separated the existing logic into a cte and am hoping someone can advise on what I need to do instead of "to_timestamp", or what further wrangling I need to do allow to_timestamp to work.
I'm aware this is all starting to look a bit horrible... something similar worked in SQL Server, using TRY_CONVERT (and CHARINDEX instead of INSTR), and I'm trying to adapt it for Spark SQL:
out = spark.sql(f"""
with cte1 as (
SELECT
TITLE
,SYSTEMONLY as System_Only
,IF(SYSTEMONLY = "Yes"
,IF(len(TITLE) <= 28 AND TITLE LIKE '&Snapshot%',
"PLACEHOLDER",
REPLACE(
REPLACE(
REPLACE(
REPLACE(
SUBSTRING(
TRIM(
SUBSTRING(
TITLE,
INSTR(
TITLE,' - '
) + 3,
LEN(TITLE)
)
),
1,
len(
TRIM(
SUBSTRING(
TITLE,
INSTR(
TITLE,' - '
) + 3,
LEN(TITLE)
)
)
) -6)
,'th','')
,'nd','')
,'st','')
,'rd','')
)
,NULL
) as Budget_Date
FROM LakehouseName.TableName
WHERE SYSTEMONLY = 'Yes'
)
SELECT TITLE, System_Only, Budget_Date,
to_timestamp(Budget_Date,'dd/MM/yyyy') as FOrmatted
FROM cte1
""")
display(out)
You've only given one example, so I'm making some assumptions (2 digits for day, 3 characters for month).
to_date ('30 May 2024', 'dd MMM yyyy')
That will return (as a date) 2024-05-30
Depending on your version spark, you may need to run this first:
spark.conf.set("spark.sql.legacy.timeParserPolicy","LEGACY")
EDIT: That format string seems to work with single digit days and full name for month.
to_date ('1 November 2024', 'dd MMM yyyy')
That returns 2024-11-01.