pythonstringfileseektell

Python: scan file for substring, save position, then return to it


I'm writing a script that needs to scan a file until it finds the line at which a substring occurs, save the position of the beginning of that line, then return to it later. I'm very new to python, so I've not had much success yet. Here is my current code:

with open("test.txt") as f:
pos = 0
line = f.readline()
while line:
    if "That is not dead" in line:
        pos = f.tell() - len(line.encode('utf-8'))
        # pos = f.tell()

    line = f.readline()

f.seek(pos)
str = f.readline()
print(str)

With test.txt:

That is not dead
Which can eternal lie
Till through strange aeons
Even Death may die

Sphinx of black quartz, judge my vow!

Here's the output:

hat is not dead

[newline character]

I realized that my original pos = f.tell() gave me the position of the end of the line rather than the beginning, and I found this answer detailing how to get the byte length of a string, but using this cuts off the first character. Using utf-16 or utf-16-le gives ValueError: negative seek position -18 or ValueError: negative seek position -16, respectively. I tried to use the solution from this answer, using this code:

with open("ctest.txt") as f:
pos = 0
line = f.readline()
while line:
    if "That is not dead" in line:
        print(line)
        f.seek(-len(line), 1)
        zz = f.readline()
        print(zz)
    line = f.readline()

f.seek(pos)
str = f.readline()
print(str)

which gives io.UnsupportedOperation: can't do nonzero cur-relative seeks at f.seek(-len(line), 1)

Can someone please point out where I'm going wrong?


Solution

  • Stefan Papp suggested saving the position before the line was read, a simple solution that I failed to consider. An adjusted version:

    with open("test.txt") as f:
    pos = 0
    tempPos = 0
    line = f.readline()
    while line:
        if "That is not" in line:
            pos = tempPos
            
        tempPos = f.tell()
        line = f.readline()
    
    f.seek(pos)
    str = f.readline()
    print(str)
    

    With the correct output:

    That is not dead
    [newline character]
    

    Thanks, Stefan. I guess I was too deep into my issue to think clearly about it. If there's a better way to iterate through the file than what I've done, I'd be interested to know, but this seems to work.