oracle-databaseamazon-s3oracle19cdbms-cloud

ORA-20000: missing column list for dbms_cloud external table in Parquet format


I'm currently trying to create an external table for a Parquet file in a S3 bucket. This works:

begin
  dbms_cloud.create_external_table(
    table_name =>'test_parquet'
    ,credential_name =>'&my_credential'
    ,file_uri_list =>'https://s3.eu-central-1.amazonaws.com/&my_bucket/test.parquet'
    ,format =>  '{"type": "parquet",  "schema": "first"}'
    ,column_list => 'x number, y number'
  );
end;

However, I'd like to omit the column list. According to the documentation and an Oracle Blog post, this should be possible:

begin
  dbms_cloud.create_external_table(
    table_name =>'test_parquet'
    ,credential_name =>'&my_credential'
    ,file_uri_list =>'https://s3.eu-central-1.amazonaws.com/&my_bucket/test.parquet'
    ,format =>  '{"type": "parquet",  "schema": "first"}'
  );
end;

Oracle is supposed to inspect the Parquet file and determine the column information automatically. But this gives me an error:

ORA-20000: Missing column list ORA-06512: at

"C##CLOUD$SERVICE.DBMS_CLOUD", line 1181 ORA-06512: at

"C##CLOUD$SERVICE.DBMS_CLOUD", line 3158 ORA-06512: at line 2

Things I've tried:

Additional information:

How can I trick Oracle into automatically determining the column information?


Solution

  • You’re not missing anything obvious — this one is just a bit misleading unless you’ve run into it before.

    The part in the docs about using schema: "first" or "all" to auto-detect columns from Parquet metadata is technically true, but there’s a catch: it only works in Oracle Autonomous Database, not on-prem installations like your Oracle 19.26 setup.

    That’s why you’re seeing ORA-20000: Missing column list. The on-prem version of DBMS_CLOUD just doesn’t support automatic column inference from Parquet files. So even if you set schema: "first", it still expects you to provide a column_list. Basically, that feature isn’t implemented the same way in 19c as it is in the cloud version.

    What you can do

    If you know your Parquet schema ahead of time (or it doesn’t change much), you’ll have to manually specify the columns — yeah, it’s a bit tedious, but that’s the only way to get it to work on-prem.

    If your files come from a consistent source, another option is to use a tool like: • parquet-tools • Python with pyarrow

    These can read the schema from the Parquet file so you can generate the column_list dynamically in a script.

    TL;DR:

    You’re right that the docs suggest auto-detection should work, but that feature is only available in Oracle Autonomous Database. For on-prem 19c, you’ll have to provide the column list manually.