pythonmemory-efficientmemoryview

Is the memoryview object used correctly in this snippet?


I need to find all the palindromes that occur in a certain text. I will extract the data from an external file. I need to take care of memory efficient handling of the data, therefore I use a memoryview object. However, I need to perform some string operations on the memoryview object so I used the tobytes() method. Is this the correct way to handle these objects without copying the data?

from collections import Counter

palindrome = []
# read file as binary data
with open('some_text.txt', 'rb') as fr:

    # create memoryview object
    data = memoryview(fr.read())

    # applying the tobytes() method
    text = data.tobytes()

    # split the sentences to words
    for word in text.split():
        # append to palindrome list if true
        if is_palindome(word):
            palindrome.append(word)

    # return a Counter object with the palindromes and the number of occurences
    palindrome = Counter(palindrome)
    print(palindrome)

Solution

  • You may just use bytes from fr.read()

        with open('some_text.txt', 'rb') as f:
            b = f.read()
            print(b.__class__, id(b), len(b))
            data = memoryview(b)
            text = data.tobytes()
            print(text.__class__, id(text), len(text))
    

    Possible output:

    <class 'bytes'> 47642448 173227
    <class 'bytes'> 47815728 173227
    

    For CPython id() returns the addres of the object in memory. So data.tobytes() returns a copy in this case.

    Consider to use the text mode

    with open('some_text.txt', 'r') as f: