I have a column with dates (in a string format) in Dataprep: yyyymmdd. I would like it to become a datetime object. Which function/transformation should I apply to achieve this result automatically?
In this case, you actually don't need to apply a transformation at all—you can just change column type to Date/Time and select the appropriate format options.
Note: This is one of the least intuitive parts of Dataprep as you have to select an incorrect format (in this case yy-mm-dd
) before you can drill-down to the correct format (yyyymmdd
).
Here's a screenshot of the Date / Time type window to illustrate this:
While it's unintuitive, this will correctly treat the column as a date in future operations, including assigning the correct type in export operations (e.g. BigQuery).
Through the UI, this will generate the following Wrangle Script:
settype col: YourDateCol customType: 'Datetime','yy-mm-dd','yyyymmdd' type: custom
According to the documentation, this should also work (and is more succinct):
settype col: YourDateCol type: 'Datetime','yy-mm-dd','yyyymmdd'
Note that if you absolutely needed to do this in a function context, you could extract the date parts using SUBSTRING
/LEFT
/RIGHT
and pass them to the DATE
or DATETIME
function to construct a datetime object. As you've probably already found, DATEFORMAT
will return NULL if the source column isn't already of type Datetime
.
(From a performance standpoint though, it would probably be far more efficient for a large dataset to either just change the the or create a new column with the correct type versus having to perform those extra operations on so many rows.)