c++coracle-databaseoracle-call-interfacenls-lang

How do I use OCIEnvNlsCreate() to always get CHAR and NCHAR data back in UTF8 encoding?


Currently I'm using OCIEnvCreate() to create an OCI session handle to communicate with Oracle databases. I'd like to explicitly use UTF8 rather than relying on whatever client locale has been set, and I gather that I need to use OCIEnvNlsCreate() to do this. But there's something I don't understand. Here's the signature for OCIEnvNlsCreate():

sword OCIEnvNlsCreate   ( OCIEnv        **envhpp,
                          ub4           mode,
                          dvoid         *ctxp,
                          dvoid         *(*malocfp)
                                             (dvoid *ctxp,
                                              size_t size),
                          dvoid         *(*ralocfp)
                                             (dvoid *ctxp,
                                              dvoid *memptr,
                                              size_t newsize),
                          void          (*mfreefp)
                                             (dvoid *ctxp,
                                              dvoid *memptr))
                          size_t        xtramemsz,
                          dvoid         **usrmempp
                          ub2           charset,
                          ub2           ncharset );

Notice charset and ncharset are integral types, not strings. So I'm guessing that I need to specify an NLS ID? So where are these NLS IDs? They're not in the OCI headers anywhere - I've grep'ed them pretty thoroughly. I know what some of the strings that should show up in NLS_LANG are - stuff like "CL8MACCYRILLIC" and "TR8PC857" - but their IDs don't seem to be published anywhere?

I've trawled through IDs 1-999 with OCINlsCharSetIdToName() which tells me that UTF8 is 871, but I have a queasy feeling about hard-coding that given that Oracle decided to not document this or make it public? And if I always use OCINlsCharSetNameToId( handle, "UTF8" ), I have to create a dummy session handle first (with OCIEnvCreate() or OCIEnvNlsCreate()), call OCINlsCharSetNameToId(), close the dummy session handle, and then call OCIEnvNlsCreate() again with the NLS ID?

Is this really the way this is supposed to work??? I must have this wrong...?


Solution

  • Either try to call setenv() in C++ code before calling OCIEnvCreate().

    Or check Metalink NOTE.93358.1 SCRIPT: Where to Find Specifications of Character Encoding:

    Create the "dectohex" function first by referencing the above Note:67533.1 prior to running the query below.

    set pages 1000
    col nls_charset_id for 9999
    col hex for a10
    col value for a20
    select nls_charset_id(value) nls_charset_id,
    base_convert.dec_to_hex(nls_charset_id(value)) hex, value
    from v$nls_valid_values
    where parameter = 'CHARACTERSET'
    order by nls_charset_id(value);
    
    NLS_CHARSET_ID HEX        VALUE
    -------------- ---------- --------------------
    1              1          US7ASCII
    ...
    
    
    //Note: the characterset constant number for UTF8 = 871 can be retrieved in NOTE.93358.1 SCRIPT: Where to Find Specifications of Character Encoding
    ub2 cs = 871, ncs =871;
    sword res = OCICALL(OCIEnvNlsCreate(&_handle, oci_mode, 0/*ctxp*/, 0, 0, 0, 0/*xtramem_sz*/, 0/*usrmempp*/, cs, ncs));
    

    IMHO it is safe to be hardcoded. The same value is also hardcoded on DB side. PS: Conversion into target charset is performed on DB client side. Oracle Instant client libs support only US7ASCII and AL32UTF8 charsets.