oracle-databasepostgresqlutf-8oracle-fdw

PostgreSQL 9.5 - decode / select case to solve error with utf8 doesn't work


In continuation of my last post - "migration oracle to postgresql invalid byte sequence for encoding “UTF8”: 0x00"

I'm trying to insert into a local PostgreSQL table data from remote Oracle table (via oracle_fdw extension). My Oracle table has a column named street and it has valid string values and sometimes the next invalid (in PostgreSQL) string : ' ' (space).

When I try to copy the column value I get the error I mentioned above and in my last post. I understood that I need to change the oracle data before I insert it to PostgreSQL. I must do it on the fly so I tried to search for oracle decode func in PostgreSQL. I found two solution and I used both of them but I got same error:

1.using select with case :

mydb=>select *,(case when v.street=' ' then null END) from customer_prod v;
ERROR:  invalid byte sequence for encoding "UTF8": 0x00
CONTEXT:  converting column "street" for foreign table scan of 
 "customer_prod", row 254148

2.using decode function from orafce extension :

mydb=>select decode(street,' ',null) from customer_prod;
ERROR:  invalid byte sequence for encoding "UTF8": 0x00

So, I'm still getting the error. How can I solve this issue?


Solution

  • The error occurs when the values are transferred from Oracle to PostgreSQL, so post-processing won't prevent the error.

    For the sake of demonstration, let's create an Oracle table that exhibits the problem:

    CREATE TABLE nulltest(
       id number(5) CONSTRAINT nulltest_pkey PRIMARY KEY,
       val varchar2(10 CHAR)
    );
    
    INSERT INTO nulltest VALUES (1, 'schön');
    INSERT INTO nulltest VALUES (2, 'bö' || CHR(0) || 'se');
    INSERT INTO nulltest VALUES (3, 'egal');
    
    COMMIT;
    

    Let's create a foreign table in PostgreSQL for it:

    CREATE FOREIGN TABLE nulltest (
       id integer OPTIONS (key 'true') NOT NULL,
       val varchar(10)
    ) SERVER oracle
       OPTIONS (table 'NULLTEST');
    
    SELECT * FROM nulltest;
    
    ERROR:  invalid byte sequence for encoding "UTF8": 0x00
    CONTEXT:  converting column "val" for foreign table scan of "nulltest", row 2
    

    Now the easiest thing would be to create a foreign table that filters away the zero characters:

    CREATE FOREIGN TABLE filter_nulltest (
       id integer OPTIONS (key 'true') NOT NULL,
       val varchar(10)
    ) SERVER oracle
       OPTIONS (table '(SELECT id, replace(val, CHR(0), NULL) FROM nulltest)');
    
    SELECT * FROM filter_nulltest;
    
    ┌────┬───────┐
    │ id │  val  │
    ├────┼───────┤
    │  1 │ schön │
    │  2 │ böse  │
    │  3 │ egal  │
    └────┴───────┘
    (3 rows)
    

    Another, less efficient, option would be to create a function that catches and reports bad lines to you so that you can fix them on the Oracle side:

    CREATE OR REPLACE FUNCTION get_nulltest() RETURNS SETOF nulltest
       LANGUAGE plpgsql AS
    $$DECLARE
       v_id integer;
       n nulltest;
    BEGIN
       FOR v_id IN SELECT id FROM nulltest
       LOOP
          BEGIN
             SELECT nulltest.* INTO n
                FROM nulltest
                WHERE id = v_id;
             RETURN NEXT n;
          EXCEPTION
             WHEN OTHERS THEN
                RAISE NOTICE 'Caught error % for id=%: %', SQLSTATE, v_id, SQLERRM;
          END;
       END LOOP;
    END;$$;
    
    SELECT * FROM get_nulltest();
    
    NOTICE:  Caught error 22021 for id=2: invalid byte sequence for encoding "UTF8": 0x00
    ┌────┬───────┐
    │ id │  val  │
    ├────┼───────┤
    │  1 │ schön │
    │  3 │ egal  │
    └────┴───────┘
    (2 rows)