oracle-databaseoracle-autonomous-db

Load selective fields from CSV in object storage to ADW using external tables


Can we load only selective columns from a flat file into ADW using External Table? Customer has a huge flat file that gets generated and they want a way to see if there is a way they could just be able to bring only their desired columns from the file into ADW

This is what I tried so far: CSV file in Object storage has three columns, I am trying to just pull two of them here:

            BEGIN  
   DBMS_CLOUD.CREATE_EXTERNAL_TABLE(   
      table_name =
            >
            'gentherm_canalyser5'
            ,   
      credential_name =
            >
            'DBCREDENTIAL_123'
            ,   
      file_uri_list =
            >
            'https://objectstorage.us-ashburn-1.oraclecloud.com/n/natdsepltfrmanalyticshrd1/b/TEST/o/123test_can.csv'
            ,
      format =
            > json_object(
            'type' value 'csv'
            , 'skipheaders' value '1'
            )
            , 
      field_list =
            > '"Time" decimal,
                     "dutyMax" decimal'
            ,  
      column_list =
            > 'time number,
                      dutymax number'
            )
            ;
   END
            ;
/ 

I keep getting this error:

ORA-29913: error in executing ODCIEXTTABLEOPEN callout
ORA-29400: data cartridge error
KUP-03025: For CSV format files, field Time is not delimited.  

Any pointers, how this could work?


Solution

  • Create the external table defining all the columns in the CSV file and then only SELECT the ones you are interested in. With the external table definition you cannot pretend stuff is not physically in the file.

    So assuming you had an additional trailing column FOOBAR NUMBER that you don't want to deal with, your code and processing would look as follows:

    BEGIN  
       DBMS_CLOUD.CREATE_EXTERNAL_TABLE(   
          table_name =
                >
                'gentherm_canalyser5'
                ,   
          credential_name =
                >
                'DBCREDENTIAL_123'
                ,   
          file_uri_list =
                >
                'https://objectstorage.us-ashburn-1.oraclecloud.com/n/natdsepltfrmanalyticshrd1/b/TEST/o/123test_can.csv'
                ,
          format =
                > json_object(
                'type' value 'csv'
                , 'skipheaders' value '1'
                )
                , 
          field_list =
                > '"Time" decimal,
                         "dutyMax" decimal'
                ,  
          column_list =
                > 'time number,
                          dutymax number, FOOBAR NUMBER'
                )
                ;
       END
                ;
    / 
    

    .. and then you simply only do a

    SELECT time, dutyMax FROM gentherm_canalyser5;
    

    .. to process your columns of interest.