db2parquetibm-cloud-sql-query

How to flatten an Parquet Array datatype when using IBM Cloud SQL Query


I have to push parquet file data which I am reading from IBM Cloud SQL Query to Db2 on Cloud.

My parquet file has data in array format, and I want to push that to DB2 on Cloud too.

Is there any way to push that array data of parquet file to Db2 on Cloud?


Solution

  • Have you checked out this advise in the documentation?

    https://cloud.ibm.com/docs/services/sql-query?topic=sql-query-overview#limitations

    If a JSON, ORC, or Parquet object contains a nested or arrayed structure, a query with CSV output using a wildcard (for example, SELECT * from cos://...) returns an error such as "Invalid CSV data type used: struct." Use one of the following workarounds:

    • For a nested structure, use the FLATTEN table transformation function.
    • Alternatively, you can specify the fully nested column names instead of the wildcard, for example, SELECT address.city, address.street, ... from cos://....
    • For an array, use the Spark SQL explode() function, for example, select explode(contact_names) from cos://....