oracle-databasedatapumpcharacter-set

Oracle data pump: WE8MSWIN1252 source to AL32UTF8 target


Situation is we receive a handful of data pump "dump files" from external sources on a daily basis. We have no (well, extremely limited) control over the source. Source database is WE8MSWIN1252. This worked for years, but now we are moving to a new platform and all databases on our end will be AL32UTF8. This is not a problem unless there's a row where the data fills up the full column width and contains certain special characters. Then, on import, you get the ol' ORA-02374 "conversion error" followed by ORA-12899 "value too large for column". The problem is the source defines VARCHARs as BYTE, when what is needed is CHAR.

I have been able to create a repeatable case for this in our own lab so I can test solutions both export and import side. Some have suggested setting the environment variable NLS_LANG to AMERICAN_AMERICA.AL32UTF8 before running expdp. Nope, not it. Some have suggested alter session to set NLS_LENGTH_SEMANTIC='CHAR' before running the import. Again, not it. Independently, I discerned I could get the DDL, create the tables ahead of time swapping out "BYTE" for "CHAR", and then import with the TABLE_EXISTS_ACTION set to TRUNCATE. That works, but it's not sustainable because this is not a one-time migration but a daily process, and for one dump in particular, we are talking about 300+ tables, and although unlikely, because we have no control over the source, the table/column structure may change without warning.

Of course, I could gin up some code to pull the DDL each time, create the columns as "CHAR" and then import, but that's a bit too much of Rube-Goldberg machine for me, just seems too rickety and not sustainable.

Is there some parameter on the import side to tell data pump to create all VARCHARs as CHAR not BYTE? My research hasn't completely closed this door, so I'm still looking, but I've hit enough walls that I wanted to create my own question for input. Oh, and yes, I know if there happens to be a VARCHAR(4000) using CHAR instead of BYTE won't work for other reasons. But so far, it seems for the most part the source resorts to CLOBs for larger narrative data.


Solution

  • One possibility is to create a trigger in the imported schemas to capture all CREATE TABLE DDL statements and change the column definitions from BYTE to CHAR.

    Create a log table to keep track of the modified tables.

    create table LOG_CREATE_TABLE
    ( OPERATION varchar2(25),
      OWNER varchar2(100),
      NAME varchar2(100)
    );
    

    Create the trigger.

    create or replace trigger TABLE_DDL_TRIGGER
    after create on SCHEMA
    declare
    l_sysevent varchar2(25);
    BEGIN
    select ora_sysevent into l_sysevent;
    insert into LOG_CREATE_TABLE
      select ora_sysevent, ora_dict_obj_owner, ora_dict_obj_name;
    FOR l_col IN (select TABLE_NAME, COLUMN_NAME, DATA_TYPE, CHAR_LENGTH from USER_TAB_COLS 
                  where TABLE_NAME = ora_dict_obj_name and CHAR_USED = 'B' and DATA_LENGTH < 3500) LOOP
      EXECUTE IMMEDIATE 'alter table ' || l_col.TABLE_NAME || ' modify ' || 
        l_col.COLUMN_NAME || ' ' || l_col.DATA_TYPE || '(' || l_col.CHAR_LENGTH || 'CHAR)';
    END LOOP;
    END;
    /
    

    You can test the trigger by creating a simple table.

    create table t2
    ( col1 varchar2(25),
      col2 varchar2(25),
      col3 varchar2(25),
      col4 varchar2(4000) );
    

    Check column definitions for the new table

    select TABLE_NAME, COLUMN_NAME, DATA_TYPE, DATA_LENGTH, CHAR_LENGTH, CHAR_USED
    from user_tab_cols where table_name = 'T2';