parsingbittorrent

Reading the fileset from a torrent


I want to (quickly) put a program/script together to read the fileset from a .torrent file. I want to then use that set to delete any files from a specific directory that do not belong to the torrent.

Any recommendations on a handy library for reading this index from the .torrent file? Whilst I don't object to it, I don't want to be digging deep into the bittorrent spec and rolling a load of code from scratch for this simple purpose.

I have no preference on language.


Solution

  • Effbot has your question answered. Here is the complete code to read the list of files from .torrent file (Python 2.4+):

    import re
    
    def tokenize(text, match=re.compile("([idel])|(\d+):|(-?\d+)").match):
        i = 0
        while i < len(text):
            m = match(text, i)
            s = m.group(m.lastindex)
            i = m.end()
            if m.lastindex == 2:
                yield "s"
                yield text[i:i+int(s)]
                i = i + int(s)
            else:
                yield s
    
    def decode_item(next, token):
        if token == "i":
            # integer: "i" value "e"
            data = int(next())
            if next() != "e":
                raise ValueError
        elif token == "s":
            # string: "s" value (virtual tokens)
            data = next()
        elif token == "l" or token == "d":
            # container: "l" (or "d") values "e"
            data = []
            tok = next()
            while tok != "e":
                data.append(decode_item(next, tok))
                tok = next()
            if token == "d":
                data = dict(zip(data[0::2], data[1::2]))
        else:
            raise ValueError
        return data
    
    def decode(text):
        try:
            src = tokenize(text)
            data = decode_item(src.next, src.next())
            for token in src: # look for more tokens
                raise SyntaxError("trailing junk")
        except (AttributeError, ValueError, StopIteration):
            raise SyntaxError("syntax error")
        return data
    
    if __name__ == "__main__":
        data = open("test.torrent", "rb").read()
        torrent = decode(data)
        for file in torrent["info"]["files"]:
            print "%r - %d bytes" % ("/".join(file["path"]), file["length"])