javascriptc++ccastingfromcharcode

How to handle negative or unsigned char when porting from C/C++ to JavaScript?


I'm trying to port an old C++ lexer (source) to JavaScript and am struggling a bit with my non-comprehension of C/C++.

I have a parameter c which as I currently see it could either be an index of the position on a chunk of an input file I'm parsing (*yy_cp) or the actual (including nul) character stored at this address. I need to use c as an index in a lookup table. The lexer does this:

/* Promotes a possibly negative, possibly signed char to an
 * unsigned integer for use as an array index.  If the signed char
 * is negative, we want to instead treat it as an 8-bit unsigned
 * char, hence the double cast.
 */
#define YY_SC_TO_UI(c) ((unsigned int) (unsigned char) c)

and calls it like this:

register YY_CHAR yy_c = yy_ec[YY_SC_TO_UI(*yy_cp)];

which will store the value of lookup table yy_ec which contains 256 entries (I assume extended ASCII), in yy_c. The position to lookup is generated by YY_SC_TO_UI and that's where I'm lost porting this to JavaScript. YY_SC_TO_UI has to return a number between 0-255, so do I just take what I have and:

 "[c]".charCodeAt(0)

or are ia there anything else I need to be aware of handling "possible negative, possible signed char" in JS?

Thanks.


Solution

  • Depending on a compiler char can be signed or unsigned. Presumably author wanted this to work on both the same way, and make sure that value is always zero extended, and not sign extended, when converting from char to unsigned int. A safe way to make sure the value is 0..255, and not -128..127.

    According to MDN, range of return value of charCodeAt is larger:

    The charCodeAt() method returns an integer between 0 and 65535...

    It depends on your input how you want to handle possible values out of range, but one alternative could be simple bit masking:

    "€".charCodeAt(0) & 0xff;