pythonasn.1pyasn1

Decode using ASN.1 where substrate contains some opaque data


I would like to use pyasn1 to decode some data, part of which is opaque. That is, part of the data contained in the ASN.1-defined structure may or may not be ASN.1 decode-able, and I need to parse the preamble to find out how to decode it.

Based on what I understand from the pyasn1 codec documentation on "Decoding untagged types," I should be able to use the pyasn.univ.Any type to handle this case.

Here is some example code to illustrate the problem I'm having.

#!/usr/bin/env python

from pyasn1.type import univ, namedtype
from pyasn1.codec.der import decoder, encoder

class Example(univ.Sequence):
    componentType = namedtype.NamedTypes(
        namedtype.NamedType('spam', univ.Integer()),
        namedtype.NamedType('eggs', univ.Any())
    )

example = Example()
example['spam'] = 42
example['eggs'] = univ.Any(b'\x01\x00abcde') # Some opaque data
substrate = encoder.encode(example)

"""
    >>> import binascii
    >>> print(binascii.hexlify(substrate).decode('ascii')))
    300a02012a01006162636465

      ^^      ^
      ||      + Opaque data begins here
      ++ Note: the length field accounts for all remaining substrate
"""

data, tail = decoder.decode(substrate, asn1Spec=Example())
print(data)

The encoded example is consistent with my expectations. However, this program fails inside the decoder with the following traceback.

Traceback (most recent call last):
  File "./any.py", line 27, in <module>
    data, tail = decoder.decode(substrate, asn1Spec=Example())
  File "/Users/neirbowj/Library/Python/3.4/lib/python/site-packages   /pyasn1-0.1.8-py3.4.egg/pyasn1/codec/ber/decoder.py", line 825, in __call__
  File "/Users/neirbowj/Library/Python/3.4/lib/python/site-packages/pyasn1-0.1.8-py3.4.egg/pyasn1/codec/ber/decoder.py", line 342, in valueDecoder
  File "/Users/neirbowj/Library/Python/3.4/lib/python/site-packages/pyasn1-0.1.8-py3.4.egg/pyasn1/codec/ber/decoder.py", line 706, in __call__
pyasn1.error.SubstrateUnderrunError: 95-octet short

I believe what's happening is that the decoder is trying to work on the portion of the data I've tried to identify as univ.Any and failing---because it's not a valid encoding---rather than returning it to me as some binary data encapsulated in a univ.Any object as I expect.

How can I parse data of this form using pyasn1?

Incidentally, the actual data I am trying to decode is a SASL token using the GSSAPI mechanism, as defined in section 4.1 of RFC 4121: KRB5 GSSAPI mechanism v2, which I excerpt here for convenience.

     GSS-API DEFINITIONS ::=

     BEGIN

     MechType ::= OBJECT IDENTIFIER
     -- representing Kerberos V5 mechanism

     GSSAPI-Token ::=
     -- option indication (delegation, etc.) indicated within
     -- mechanism-specific token
     [APPLICATION 0] IMPLICIT SEQUENCE {
             thisMech MechType,
             innerToken ANY DEFINED BY thisMech
                -- contents mechanism-specific
                -- ASN.1 structure not required
             }

     END

The innerToken field starts with a two-octet token-identifier
(TOK_ID) expressed in big-endian order, followed by a Kerberos
message.

Following are the TOK_ID values used in the context establishment
tokens:

      Token               TOK_ID Value in Hex
     -----------------------------------------
      KRB_AP_REQ            01 00
      KRB_AP_REP            02 00
      KRB_ERROR             03 00

EDIT 1: Attach sample data

Here is a sample GSSAPI-Token (lightly sanitized) that was serialized, I believe, by cyrus-sasl and heimdal.

YIIChwYJKoZIhvcSAQICAQBuggJ2MIICcqADAgEFoQMCAQ6iBwMFACAAAACjggFm
YYIBYjCCAV6gAwIBBaELGwlBU04uMVRFU1SiNjA0oAMCAQGhLTArGwtzZXJ2aWNl
bmFtZRscc2VydmljZWhvc3QudGVzdC5leGFtcGxlLmNvbaOCARAwggEMoAMCARCh
AwIBBKKB/wSB/A81akUNsyvRCCKtERWg9suf96J3prMUQkabsYGpzijfEeCNe0ja
Eq6c87deBG+LeJqFIyu65cCMF/oXtyZNB9sUxpqFBcfkAYZXTxabNLpZAUmkdt6w
dYlV8JK/G3muuG/ziM14oCbh8hIY63oi7P/Pdyrs3s8B+wkNCpjVtREHABuF6Wjx
GYem65mPqCP9ZMSyD3Bc+dLemxhm7Kap8ExoVYFRwuFqvDf/E5MLCk2HThw46UCF
DqFnU46FJBNGAK+RN2EptsqtY48gb16klqJxU7bwHeYoCsdXyB6GElIDe1qrPU15
9mGxpdmSElcVxB/3Yzei48HzlkUcfqSB8jCB76ADAgEQooHnBIHkZUyd0fJO3Bau
msqz6ndF+kBxmrGS6Y7L20dSYDI2cB8HsJdGDnEODsAAcYQ0L5c2N/mb8QHh7iU9
gtjWHpfq/FqMF4/aox/BJ0Xzuy2gS4sCafs7PTYtSDh2nyLkNYuxKdmQ1ughbIq6
APAegqa7R1iv2oCaNijrpKc2YUfznnwT/CTSsGrJpMwz4KLuBtjI4f74bQty8uNn
LVxxV4J8wU1s7lSj4Ipbi+a1WdCVsLs8lIqFmKXte+1c+qHeadoAGmSTBT3qFZae
SRdT8dpYr6i6fkjRsoyEZs9ZqQtwQAYSdMBU

Solution

  • My impression is that ANY type serialization can only contain valid BER/DER serialization. Think of ANY type as a CHOICE type with infinite number of alternatives (see chapter on ANY type here).

    My first instinct is to put innerToken into OCTET STRING like this:

    class Example(univ.Sequence):
        componentType = namedtype.NamedTypes(
            namedtype.NamedType('spam', univ.Integer()),
            namedtype.NamedType('eggs', univ.OctetString())
        )
    

    what would give you ready-made values upon decoding:

    >>> example = Example()
    >>> example['spam'] = 42
    >>> example['eggs'] = b'\x01\x00abcde'
    >>> print(example.prettyPrint())
    Example:
     spam=42
     eggs=0x01006162636465
    >>> substrate = encoder.encode(example)
    >>> data, tail = decoder.decode(substrate, asn1Spec=Example())
    >>> print(data.prettyPrint())
    Example:
     spam=42
     eggs=0x01006162636465
    

    On the other hand, if you would literally use the values from the spec:

    KRB_AP_REQ            01 00
    KRB_AP_REP            02 00
    KRB_ERROR             03 00
    

    they would look like valid DER serialization that could be decoded with your original Example spec:

    >>> KRB_AP_REQ = '\x01\x00'
    >>> KRB_AP_REP = '\x02\x00'
    >>> KRB_ERROR = '\x03\x00'
    >>> class Example(univ.Sequence):
    ...     componentType = namedtype.NamedTypes(
    ...         namedtype.NamedType('spam', univ.Integer()),
    ...         namedtype.NamedType('eggs', univ.Any()),
    ...         namedtype.NamedType('ham', univ.Any()),
    ... )
    ... 
    >>> example = Example()
    >>> example['spam'] = 42
    >>> example['eggs'] = KRB_AP_REQ
    # obtain DER serialization for ANY type that follows
    >>> example['ham'] = encoder.encode(univ.Integer(24))
    >>> print(example.prettyPrint())
    Example:
     spam=42
     eggs=0x0100
     ham=0x020118
    >>> substrate = encoder.encode(example)
    >>> data, tail = decoder.decode(substrate, asn1Spec=Example())
    >>> print(data.prettyPrint())
    Example:
     spam=42
     eggs=0x0100
     ham=0x020118
    >>> data['eggs'].asOctets()
    '\x01\x00'
    >>> data['eggs'].asNumbers()
    (1, 0)
    >>> example['eggs'] == KRB_AP_REQ
    True
    

    But that is a sort of cheating and may not work for arbitrary innerToken values.

    So how does GSSAPI-Token serialization produced by other tools looks like?