pythonscripting

Read Null terminated string in python


I'm trying to read a null terminated string but i'm having issues when unpacking a char and putting it together with a string.

This is the code:

def readString(f):
    str = ''
    while True:
        char = readChar(f)
        str = str.join(char)
        if (hex(ord(char))) == '0x0':
            break           
    return str

def readChar(f):
    char = unpack('c',f.read(1))[0]
    return char

Now this is giving me this error:

TypeError: sequence item 0: expected str instance, int found

I'm also trying the following:

char = unpack('c',f.read(1)).decode("ascii")

But it throws me: AttributeError: 'tuple' object has no attribute 'decode'

I don't even know how to read the chars and add it to the string, Is there any proper way to do this?


Solution

  • (edit version 2, added extra way at the end)

    Maybe there are some libraries out there that can help you with this, but as I don't know about them lets attack the problem at hand with what we know.

    In python 2 bytes and string are basically the same thing, that change in python 3 where string is what in py2 is unicode and bytes is its own separate type, which mean that you don't need to define a read char if you are in py2 as no extra work is required, so I don't think you need that unpack function for this particular case, with that in mind lets define the new readString

    def readString(myfile):
        chars = []
        while True:
            c = myfile.read(1)
            if c == chr(0):
                return "".join(chars)
            chars.append(c)
    

    just like with your code I read a character one at the time but I instead save them in a list, the reason is that string are immutable so doing str+=char result in unnecessary copies; and when I find the null character return the join string. And chr is the inverse of ord, it will give you the character given its ascii value. This will exclude the null character, if its needed just move the appending...

    Now lets test it with your sample file

    for instance lets try to read "Sword_Wea_Dummy" from it

    with open("sword.blendscn","rb") as archi:
        #lets simulate that some prior processing was made by 
        #moving the pointer of the file
        archi.seek(6) 
        string=readString(archi)
        print "string repr:", repr(string)
        print "string:", string
        print ""
        #and the rest of the file is there waiting to be processed
        print "rest of the file: ", repr(archi.read())
    

    and this is the output

    string repr: 'Sword_Wea_Dummy'
    string: Sword_Wea_Dummy
    
    rest of the file:  '\xcd\xcc\xcc=p=\x8a4:\xa66\xbfJ\x15\xc6=\x00\x00\x00\x00\xeaQ8?\x9e\x8d\x874$-i\xb3\x00\x00\x00\x00\x9b\xc6\xaa2K\x15\xc6=;\xa66?\x00\x00\x00\x00\xb8\x88\xbf@\x0e\xf3\xb1@ITuB\x00\x00\x80?\xcd\xcc\xcc=\x00\x00\x00\x00\xcd\xccL>'
    

    other tests

    >>> with open("sword.blendscn","rb") as archi:
            print readString(archi)
            print readString(archi)
            print readString(archi)
    
    
    sword
    Sword_Wea_Dummy
    ÍÌÌ=p=Š4:¦6¿JÆ=
    >>> with open("sword.blendscn","rb") as archi:
            print repr(readString(archi))
            print repr(readString(archi))
            print repr(readString(archi))
    
    
    'sword'
    'Sword_Wea_Dummy'
    '\xcd\xcc\xcc=p=\x8a4:\xa66\xbfJ\x15\xc6='
    >>> 
    

    Now that I think about it, you mention that the data portion is of fixed size, if that is true for all files and the structure on all of them is as follow

    [unknow size data][know size data]
    

    then that is a pattern we can exploit, we only need to know the size of the file and we can get both part smoothly as follow

    import os
    
    def getDataPair(filename,knowSize):
        size = os.path.getsize(filename)
        with open(filename, "rb") as archi:
            unknown = archi.read(size-knowSize)
            know    = archi.read()
            return unknown, know
    

    and by knowing the size of the data portion, its use is simple (which I get by playing with the prior example)

    >>> strins_data, data = getDataPair("sword.blendscn", 80)
    >>> string_data, data = getDataPair("sword.blendscn", 80)
    >>> string_data
    'sword\x00Sword_Wea_Dummy\x00'
    >>> data
    '\xcd\xcc\xcc=p=\x8a4:\xa66\xbfJ\x15\xc6=\x00\x00\x00\x00\xeaQ8?\x9e\x8d\x874$-i\xb3\x00\x00\x00\x00\x9b\xc6\xaa2K\x15\xc6=;\xa66?\x00\x00\x00\x00\xb8\x88\xbf@\x0e\xf3\xb1@ITuB\x00\x00\x80?\xcd\xcc\xcc=\x00\x00\x00\x00\xcd\xccL>'
    >>> string_data.split(chr(0))
    ['sword', 'Sword_Wea_Dummy', '']
    >>>          
    

    Now to get each string a simple split will suffice and you can pass the rest of the file contained in data to the appropriated function to be processed