I need to find all the palindromes that occur in a certain text. I will extract the data from an external file. I need to take care of memory efficient handling of the data, therefore I use a memoryview object. However, I need to perform some string operations on the memoryview object so I used the tobytes() method. Is this the correct way to handle these objects without copying the data?
from collections import Counter
palindrome = []
# read file as binary data
with open('some_text.txt', 'rb') as fr:
# create memoryview object
data = memoryview(fr.read())
# applying the tobytes() method
text = data.tobytes()
# split the sentences to words
for word in text.split():
# append to palindrome list if true
if is_palindome(word):
palindrome.append(word)
# return a Counter object with the palindromes and the number of occurences
palindrome = Counter(palindrome)
print(palindrome)
You may just use bytes
from fr.read()
with open('some_text.txt', 'rb') as f:
b = f.read()
print(b.__class__, id(b), len(b))
data = memoryview(b)
text = data.tobytes()
print(text.__class__, id(text), len(text))
Possible output:
<class 'bytes'> 47642448 173227
<class 'bytes'> 47815728 173227
For CPython id()
returns the addres of the object in memory. So data.tobytes()
returns a copy in this case.
Consider to use the text mode
with open('some_text.txt', 'r') as f: