encodingcodecasn.1ber

How is length of a string encoded when the bounds of the string are not defined in UPER encoding for the ASN when length exceeds 127


If I have the ASN.1 schema as shown below and I want to encode

message "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"

following the uPER rules, then I get the following encoding.

World-Schema DEFINITIONS AUTOMATIC TAGS ::= 
BEGIN
  Rocket ::= SEQUENCE       
  {
     message   UTF8String 
  }                                                     
END

I get the following encoding

8118616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161

Now just focusing on the first 16 bits which are

1000 0001 0001 1000

These specify the length of the string as 280. Why is the length encoded in 15 bits and not the whole 16 bits. The rest bits after the first 16 bits are just octets with the value 97 (which is 'a' in ASCII)


Solution

  • In this case, the length is actually encoded in 14 bits, not 15. It uses option b below which has the first two bits as '10' See Rec. ITU-T X.691 | ISO/IEC 8825-2 clause 11.9 which contains the following:

    11.9       General rules for encoding a length determinant

    NOTE 1 – (Tutorial) The procedures of this subclause are invoked when an explicit length field is needed for some part of the encoding regardless of whether the length count is bounded above (by PER-visible constraints) or not. The part of the encoding to which the length applies may be a bit string (with the length count in bits), an octet string (with the length count in octets), a known-multiplier character string (with the length count in characters), or a list of fields (with the length count in components of a sequence-of or set-of).

    NOTE 2 – (Tutorial) In the case of the ALIGNED variant if the length count is bounded above by an upper bound that is less than 64K, then the constrained whole number encoding is used for the length. For sufficiently small ranges the result is a bit-field, otherwise the unconstrained length ("n" say) is encoded into an octet-aligned bit-field in one of three ways (in order of increasing size):

    a)      ("n" less than 128) a single octet containing "n" with bit 8 set to zero;

    b)      ("n" less than 16K) two octets containing "n" with bit 8 of the first octet set to 1 and bit 7 set to zero;

    c)      (large "n") a single octet containing a count "m" with bit 8 set to 1 and bit 7 set to 1. The count "m" is one to four, and the length indicates that a fragment of the material follows (a multiple "m" of 16K items). For all values of "m", the fragment is then followed by another length encoding for the remainder of the material.

    NOTE 3 – (Tutorial) In the UNALIGNED variant, if the length count is bounded above by an upper bound that is less than 64K, then the constrained whole number encoding is used to encode the length in the minimum number of bits necessary to represent the range. Otherwise, the unconstrained length ("n" say) is encoded into a bit‑field in the manner described above in Note 2.