syntaxnaming-conventionsnaming

Alternative character for decimal point '.'


I have requirement to encode a signed decimal number using only a-z, underscore, dash, and 0-9. No other special characters are allowed.

The decimal number is often times preceded by text and an underscore. A dash in front of the number represents a minus sign and therefore a negative value.

Given:

tree_-0.125 and flower_21.875

Potential transformations:

Use lowercase o:

tree_-0.125 -> tree_-0o125

flower_21.875 -> flower_21o875

Use lowercase d as in decimal:

tree_-0.125 -> tree_-0d125

flower_21.875 -> flower_21d875

Use lowercase f as in float:

tree_-0.125 -> tree_-0f125

flower_21.875 -> flower_21f875

Use lowercase _:

tree_-0.125 -> tree_-0_125

flower_21.875 -> flower_21_875

Human readability is important as many others will use this syntax. Any suggestions or votes for a particular syntax are encouraged.


Solution

  • Ok, here's the thing:

    requirement to use only a-z, underscore, dash and 0-9

    and

    Human readability is important as many others will use this syntax

    are absolutely and obviously conflicting.

    Don't let yourself be forced to do something like this. What's the use case? Who are you going to force to use this syntax?

    I promise, you'll be seeing human<->syntax conversion tools as soon as you introduce something like that to the wild, so you could as well just use an arbitrary mapping of bytes to az_/09-characters and allow any UTF-8 character.

    So I vote for this solution:

    1. a-z, _, /, 0-9 are 38 characters, little more than 32, which would give you five bits to work with. Awesome. Take 8 of these 5bit-equivalent symbols, and you'll have a 5Byte-Word.
    2. convert the text you want to note as UTF-8, and just save 32bit floats for the numeric values. Store the sequence of bytes you get in memory, adding a 16bit integer length field before and padding to 5 byte multiples after the data.
    3. according to the mapping from 1., build a converter Bytes->Symbols.
    4. Don't care about arbitrary, inhuman regulation. Be aware that this is not the 1960's and memory is cheap, and people do prefer to read actual text, and are not parsers.