google-cloud-data-fusioncdap

Convert to date in cloud datafusion


How do we convert a string to date in cloud datafusion?

I have a column with the value say 20191120 (format of yyyyMMdd) i want to load this into a table in bigquery as date. The table column datatype is also date.

What i have tried so far is that i converted the string to timestamp using "parse-as-simple-date" and i try to convert it to format using format-date to "yyyy-MM-dd", but this step converts it to string and the final load fails. I have even tried to explicitly mention the column as date in the o/p schema as date. But it fails at runtime.

I tried keeping it as timestamp in the pipeline and try loading the date into Bigquery date type.

I noticed in the error that came op was field dt_1 incompatible with avro integer. Is datafusion internally converting the extract into avro before loading. AVRO does not have a date datatype which is causing the isssue?


Solution

  • Adding answer for posterity:

    You can try doing these,

    1. Go to LocalDateTime column in wrangler
    2. Open dropdown and click on "Custom Transform"
    3. Type timestamp.toLocalDate() (timestamp being the column name)

    After the last step it should convert it into LocalDate type which you can write to bigquery. Hope this helps