pythonpython-3.xperformancememory-efficient

create single bytes instance from sequence of memoryview


tl;dr Given a Sequence of memoryview, how can I create a single bytes instance without creating intermediate bytes instances?

The naive approach creates many intermediary instances of bytes

def create_bytes(seq_mv: Sequence[memoryview]) -> bytes:
    data = bytes()
    for mv in seq_mv:
        data = data + bytes(mv)
    return data

The function create_bytes creates len(seq_mv) + 1 instances of bytes during execution. That is inefficient.
I want create_bytes to create one new bytes instance during execution.


Solution

  • bytes as you got it, is an imutable object.

    As the Tim Peters put in the comments, you can let Python create a single instance with all parts joined together with a single call to bytes().join(seq_mv).

    If you need to perform any other operation on your data that would involve changing it in the way, you could be using the mutable bytearray instead- which not only gives you flexibility to change your object, but have all the advantages of mutable sequences.

    You can then make a single conversion to bytes at the end of your function if the users of your function can't deal straight with a a bytearray (but maybe you can just return it directly):

    def create_bytes(seq_mv: Sequence[memoryview]) -> bytes:
        data = bytearray()
        for mv in seq_mv:
            data.extend(mv)
        return bytes(data)
    

    Or simply:

    from functools import reduce
    
    data = reduce(lambda data, mv: data.extend(mv), seq_mv, bytearray())