pythonparsingbarcodezlibaztec-barcode

Need help decompressing zlib data stored in Aztec barcode (Deutsche Bahn Ticket)


Premise

I'm trying to decode the data from the barcode format currently used on tickets issued by Deutsche Bahn (german railway). I have found this very useful website (german) that already does a similar thing and offers a python script.

The website states that the data is compressed with zlib, the resulting blob is signed with DSA and all of it is stored in the barcode (Aztec format). Example of such a barcode

Problem

I have used the script provided on the website to successfully decode a ticket. Installed the python-pyasn1 library. Read the barcode (used BCTester as per instructions, had some trouble with NeoReader app) and converted the result to hex. Saved the hex data as plain text file (as is for some reason required by the script) and parsed the file with the script. It worked.

But the script is doing too much. I'd like to do the parsing myself, but I can't get the zlib decompression to work and I understand to little of the code to make sense of it. I know almost no Python. I have some programming experience, though.

If you simply look at the data from the barcode, it looks like this: https://gist.github.com/oelna/096787dc18596aaa4f5f

The first question would be: What is the DSA signature and do I need to split it from the actual compressed data first?

The second: What could a simple python script look like that reads the barcode blob from a file and simply decompresses it, so I can further parse the format. I had something in mind like

#!/usr/bin python

import zlib

ticket = open('ticketdata.txt').read()

print zlib.decompress(ticket)

but it's not working. Any hint in the right direction would be appreciated.

Here is the hex data that is readable by the script if saved to a file:

23 55 54 30 31 30 30 38 30 30 30 30 30 31 30 2c 02 14 1c 3d e9 2d cd 5e c4 c0 56 bd ae 61 3e 54 ad a1 b3 26 33 d2 02 14 40 75 03 d0 cf 9c c1 f5 70 58 bd 59 50 a7 af c5 eb 0a f4 74 00 00 00 00 30 32 37 31 78 9c 65 50 cb 4e c3 30 10 e4 53 2c 71 43 4a d9 f5 2b 36 b7 84 04 52 01 55 51 40 1c 51 01 23 2a 42 0e 21 15 3f c7 8d 1f 63 36 11 52 2b 7c f1 78 76 76 66 bd f7 8f 4d 5d 54 c4 44 ce 10 05 d2 eb 78 5b ac 32 7b b4 77 c8 11 6b 62 c7 d6 79 aa ea aa 16 e1 b2 22 4d c4 01 ad 36 58 61 ca 6b 30 c6 e5 64 a0 b6 97 0f a6 a9 6f d6 71 df c7 cf 3e 7f 37 93 66 8e c6 71 de 92 4c c0 e1 22 0d fd 57 7a cb ee b6 cf ef 69 54 fd 66 44 05 31 d0 03 18 01 05 40 04 70 9c 51 46 ad 38 49 33 00 86 20 dd 42 88 04 22 5f a6 a1 db f6 78 79 d4 79 95 76 1f 3f df fd e7 98 86 16 b1 30 0b 65 d6 3c bd 2a 15 ce d8 ab e5 79 9d 47 7b da 34 13 c7 34 73 5a 6b 0b 35 72 d9 5c 0d bb ae 53 aa e8 5f 86 b4 01 e9 25 8d 0d 50 8e 72 3c 39 3c b2 13 94 82 74 ce 2d c7 b3 41 8b ed 4c 9f f5 0b e2 85 6c 01 8c fe c7 b8 e9 87 8c d9 f1 90 28 a3 73 fe 05 6d de 5f f1

Update/Solution:

Mark Adler's tip set me on the right track. It took me hours, but I hacked together a working solution to this particular problem. If I had been smarter, I would have recognized the zlib header 78 9C at offset 68. Simply split the data at this point and the second half decompresses without complaint. Be warned, very sad python

dsa_signature = ''
zlib_data = ''
cursor = 0

with open('ticketdata.txt', "rb") as fp:
    chunk = fp.read(1)
    while chunk:
        if(cursor < 68):
            dsa_signature += chunk
        else:
            zlib_data += chunk

        chunk = fp.read(1)
        cursor = cursor + 1


print "\nSignature:"
print "%s\n" % dsa_signature
print "\nCompressed data:"
print "%s\n" % zlib_data
print "\nDecoded:"
print zlib.decompress(zlib_data)

If there is an easy solution to this, feel free to comment. I'll continue working on this for a little more and try to make it a more robust solution that actively seeks out the zlib header, without hardcoding the offset. The first half is an identifier code, like #UT010080000060,, followed by a ASN.1 DSA signature, which luckily I don't need to verify or modify.


Solution

  • There is a complete and valid zlib stream starting at offset 68 in your hex data, and going to the end. It decompresses to:

    U_HEAD01005300802P9QAN-4������������0501201514560DEDE0080ID0200180104840080BL020357031204GW3HEMP9�����������06012015060120151021193517S0010018Fernweh-Ticket natS00200012S0030001AS00900051-0-0S01200010S0140002S2S0150006BerlinS0160011NeumünsterS0210038B-Hbf 8:16 ICE794/HH-Hbf 10:16 IC2224S0230013Krull AndreaS026000213S0270019***************0484S0280013Andrea#Krull S031001006.01.2015S032001006.01.2015S035000511160S0360003271

    If you drop the first 68 bytes of your example, zlib.decompress() will return the above.

    It's up to you to figure out what the first 68 bytes are.