oracle-databaseclobnclob

Determine length in bytes of CLOB/NCLOB with multibyte charset


I'm working with an Oracle database and want to determine the length in bytes of a NCLOB using a multibyte charset (UTF-8).

LENGTHB() does not support CLOBs oder NCLOBS with multibyte charset. I could convert the NCLOB to a BLOB and get its length then. But isn't there a better way to do this?


Solution

  • Oracle stores CLOB in UTF-16 (or possibly your NCHARACTER_SET?). Each character is stored as two bytes.

    Here's how you can see the raw data:

    SQL> CREATE TABLE test_clob (c CLOB, v VARCHAR2(10 CHAR), nv NVARCHAR2(10));
    
    Table created
    
    SQL> INSERT INTO test_clob VALUES ('0123456789', '0123456789', '0123456789');
    
    1 row inserted
    
    SQL> SELECT dbms_rowid.rowid_relative_fno(ROWID),
      2         dbms_rowid.rowid_block_number(ROWID)
      3    FROM test_clob;
    
    DBMS_ROWID.ROWID_RELATIVE_FNO( DBMS_ROWID.ROWID_BLOCK_NUMBER(
    ------------------------------ ------------------------------
                                13                          94314
    
    SQL> alter system dump datafile 13 block 94314;
    
    System altered
    

    Navigate to your USER_DUMP_DEST directory and open the trace file, you should see something like this:

    col  0: [56]
     00 54 00 01 02 0c 80 00 00 02 00 00 00 01 00 00 03 64 c6 a5 00 24 09 00 00
     00 00 00 00 14 00 00 00 00 00 01 00 30 00 31 00 32 00 33 00 34 00 35 00 36
     00 37 00 38 00 39
    LOB
    Locator:
      Length:        84(56)
      Version:        1
      Byte Length:    2
      LobID: 00.00.00.01.00.00.03.64.c6.a5
      Flags[ 0x02 0x0c 0x80 0x00 ]:
        Type: CLOB 
        [...]
        Inline data[20]
        [...]
    col  1: [10]  30 31 32 33 34 35 36 37 38 39
    col  2: [20]  00 30 00 31 00 32 00 33 00 34 00 35 00 36 00 37 00 38 00 39
    

    As you can see the CLOB (column 0) is composed of a few header bytes and the same byte raw data as the UTF-16 NVARCHAR2 column.

    As such I think you will have to convert your CLOB to UTF-8 to determine its length in this character set.

    Here's an example I've used using DBMS_LOB.converttoblob:

    SQL> DECLARE
      2     l_clob         CLOB := 'abcdéfghij'; -- the é will take two bytes!
      3     l_blob         BLOB;
      4     l_dest_offset  NUMBER := 1;
      5     l_src_offset   NUMBER := 1;
      6     l_lang_context NUMBER := 0;
      7     l_warning      NUMBER;
      8  BEGIN
      9     dbms_lob.createtemporary(l_blob, FALSE, dbms_lob.call);
     10     dbms_lob.converttoblob(dest_lob     => l_blob,
     11                            src_clob     => l_clob,
     12                            amount       => dbms_lob.lobmaxsize,
     13                            dest_offset  => l_dest_offset,
     14                            src_offset   => l_src_offset,
     15                            blob_csid    => nls_charset_id('AL32UTF8'),
     16                            lang_context => l_lang_context,
     17                            warning      => l_warning);
     18     dbms_output.put_line('byte length:'||dbms_lob.getlength(l_blob));
     19     dbms_lob.freetemporary(l_blob);
     20  END;
     21  /
    
    byte length:11
    
    PL/SQL procedure successfully completed
    

    You can convert to any character set using the function nls_charset_id.