regexblobspecial-charactersoracle-apexfile-import

Handle special characters during textfile import into OracleDB via Apex


I'm working on a tool that imports textfiles into a BLOB column (OracleDB). This is handled via an Apex page with a File Browse button and connected import procedure.

For more details about the import to BLOB procedure: http://ittichaicham.com/2011/03/file-browser-in-apex-4-with-blob-column-specified-in-item-source-attribute/

The textfiles that I'm using contain special characters, null values, decimal seperators etc. For example:

(...) 111888|Overflakkée, Blabla|streetname with Rhône||12-13|UXC Placename (...)

Since it's all character data, I'm converting the BLOB to CLOB with this procedure:

FUNCTION blob_to_clob (blob_in IN BLOB)
RETURN CLOB
AS
v_clob CLOB;
v_varchar VARCHAR2(32767);
v_start PLS_INTEGER := 1;
v_buffer PLS_INTEGER := 32767;
BEGIN
DBMS_LOB.CREATETEMPORARY(v_clob, TRUE);

FOR i IN 1..CEIL(DBMS_LOB.GETLENGTH(blob_in) / v_buffer)
LOOP

   v_varchar := UTL_RAW.CAST_TO_VARCHAR2(DBMS_LOB.SUBSTR(blob_in, v_buffer, v_start));

DBMS_LOB.WRITEAPPEND(v_clob, LENGTH(v_varchar), v_varchar);
v_start := v_start + v_buffer;
END LOOP;
RETURN v_clob;
END blob_to_clob;

See for more info:http://www.dba-oracle.com/t_convert_blob_to_clob_script.htm

The problem: While converting the blob to clob, some of the special characters are lost/altered.

For example, this row:

(...) 111888|Overflakkée, Blabla|streetname with Rhône||12-13|UXC Placename (...)

will become this row:

(...) 111888|Overflakk� Blabla|streetname with Rh�|12-13|UXC Placename (...)

Row length, characters and even seperators (in this case a '|') are altered/not visible.

Regards


Solution

  • You need to do a conversion from the source character set to the character set of the database

    Here is an example I made (mainly for getting big json objects, javascript is utf8, to work with in a 8859p1 database), It's pretty simple so I won't explain it too much.

    example usage with conversion:

    l_clob := blob_to_clob (l_blob, '1');
    

    Function:

    function blob_to_clob (blob_in in blob, p_convertutf8 in char default 0)
       return clob as
       /* Ólafur Tryggvason */
       l_clob           clob;
       l_varchar        varchar2 (32767);
       l_start          pls_integer := 1;
       l_buffer         pls_integer := 32767;
       l_characterset   nls_database_parameters.value%type;
    begin
       select value
         into l_characterset
         from nls_database_parameters
        where parameter = 'NLS_CHARACTERSET';
    
       dbms_lob.createtemporary (l_clob, true);
    
       for i in 1 .. ceil (dbms_lob.getlength (blob_in) / l_buffer) loop
          l_varchar := utl_raw.cast_to_varchar2 (dbms_lob.substr (blob_in, l_buffer, l_start));
    
          if p_convertutf8 = '1' then
             l_varchar := convert (l_varchar, l_characterset, 'UTF8'); -- WE8ISO8859P1
          end if;
    
          dbms_lob.writeappend (l_clob, length (l_varchar), l_varchar);
    
          l_start := l_start + l_buffer;
       end loop;
    
       return l_clob;
    end blob_to_clob;