google-cloud-platformgoogle-cloud-dataprep

Dataprep change str yyyymmdd date to datetime column


I have a column with dates (in a string format) in Dataprep: yyyymmdd. I would like it to become a datetime object. Which function/transformation should I apply to achieve this result automatically?


Solution

  • In this case, you actually don't need to apply a transformation at all—you can just change column type to Date/Time and select the appropriate format options.

    Note: This is one of the least intuitive parts of Dataprep as you have to select an incorrect format (in this case yy-mm-dd) before you can drill-down to the correct format (yyyymmdd).

    Here's a screenshot of the Date / Time type window to illustrate this: Google Cloud Dataprep Date/Time type picker dialog showing the yy-mm-dd type and yyyymmdd subtype selected

    While it's unintuitive, this will correctly treat the column as a date in future operations, including assigning the correct type in export operations (e.g. BigQuery).

    Through the UI, this will generate the following Wrangle Script:

    settype col: YourDateCol customType: 'Datetime','yy-mm-dd','yyyymmdd' type: custom
    

    According to the documentation, this should also work (and is more succinct):

    settype col: YourDateCol type: 'Datetime','yy-mm-dd','yyyymmdd'
    

    Note that if you absolutely needed to do this in a function context, you could extract the date parts using SUBSTRING/LEFT/RIGHT and pass them to the DATE or DATETIME function to construct a datetime object. As you've probably already found, DATEFORMAT will return NULL if the source column isn't already of type Datetime.

    (From a performance standpoint though, it would probably be far more efficient for a large dataset to either just change the the or create a new column with the correct type versus having to perform those extra operations on so many rows.)