I'm to create a generator that accepts a name of the file or a fileobject, words that we're looking for in a line and stop words that tell us that we should skip this line as soon as we meet them.
I wrote a generator function, but I paid attenttion that in my realisation I can't be sure that if I open a file it will be closed afterwards, because it is not guaranteed that the generator will reach the end of its iteration.
def gen_reader(file, lookups, stopwords):
is_file = False
try:
if isinstance(file, str):
file = open(file, 'r', encoding='UTF-8')
is_file = True
except FileNotFoundError:
raise FileNotFoundError('File not found')
else:
lookups = list(map(str.lower, lookups))
stop_words = list(map(str.lower, stopwords))
for line in file:
original_line = line
line = line.lower()
if any(lookup in line for lookup in lookups) \
and not any(stop_word in line for stop_word in stop_words):
yield original_line.strip()
if is_file:
file.close()
I was going to use context manager "with" and put the search code into it, but if I already got the file, then I'd write the same code again and it wouldn't be nice, would it?
What are your ideas how I can improve my code, I ran of them.
You could write a context managed class that is iterable like this:
from pathlib import Path
from collections.abc import Iterator
from typing import TextIO, Self
class Reader:
def __init__(self, filename: Path, lookups: list[str], stopwords: list[str]):
self._lookups: set[str] = {e.lower() for e in lookups}
self._stopwords: set[str] = {e.lower() for e in stopwords}
self._fd: TextIO = filename.open()
def __enter__(self) -> Self:
return self
def __exit__(self, *_) -> None:
self._fd.close()
def __iter__(self) -> Iterator[str]:
for line in self._fd:
words: set[str] = {e.lower() for e in line.split()}
if (self._lookups & words) and not (self._stopwords & words):
yield line.rstrip()
with Reader(Path("foo.txt"), ["world"], ["banana"]) as reader:
for line in reader:
print(line)
Now let's assume that foo.txt looks like this:
hello world
banana world
goodbye my friend
Then the output would be:
hello world
Thus we ensure that the file descriptor is properly closed at the appropriate point.
Note also the use of sets for optimum performance when checking against the lookups and stop words