[SOLVED] How to flatten an Parquet Array datatype when using IBM Cloud SQL Query

How to flatten an Parquet Array datatype when using IBM Cloud SQL Query

I have to push parquet file data which I am reading from IBM Cloud SQL Query to Db2 on Cloud.

My parquet file has data in array format, and I want to push that to DB2 on Cloud too.

Is there any way to push that array data of parquet file to Db2 on Cloud?

Solution

Have you checked out this advise in the documentation?

https://cloud.ibm.com/docs/services/sql-query?topic=sql-query-overview#limitations

If a JSON, ORC, or Parquet object contains a nested or arrayed structure, a query with CSV output using a wildcard (for example, SELECT * from cos://...) returns an error such as "Invalid CSV data type used: struct." Use one of the following workarounds:

For a nested structure, use the FLATTEN table transformation function.

Alternatively, you can specify the fully nested column names instead of the wildcard, for example, SELECT address.city, address.street, ... from cos://....

For an array, use the Spark SQL explode() function, for example, select explode(contact_names) from cos://....