pythonweb.py

How to read .txt file without .readlines() / replace UTF-8 newline character with \n?


I have some AI-generated nonsense in a .txt file that looks like this:

MENENIUS:
I have been they prayers of the reason,
And away to friends than the state pointer;
The words that shall can virtue to your head.

I have some Python code (using web.py) that looks like this:

class index(object):
    def GET(self):
        text = open("menenius.txt", "r").read() 
        return render.index(text)

When I view it in localhost, it looks like this:

MENENIUS: I have been they prayers of the reason, And away to friends than the state pointer; The words that shall can virtue to your head.

Menenius' little speech is actually just one clipping of a much larger .txt file, so I don't want to use .readlines() as going over the list will be memory-intensive. If that weren't an issue, in a normal script I'd be able to just print the list that .readlines() generates, but the fact that I'm using web.py and need to get this into render.index() complicates things.

What I've Tried

My first thought was to use the .replace() method in the script that generates menenius.txt to replace every instance of the invisible UTF-8 newline character with \n. Since .read() gives you the entire .txt file as a single string, I thought that would work but doing this:

from_text = open("menenius.txt", "r").read()
from_text.replace(0x0A, "\n")

Gets me this error, referring to the line with .replace():

TypeError: expected a character buffer object

Which I've googled, but none of it seems very applicable or very clear. I'm just starting out with Python and I've been going around in circles with this for a couple of hours, so I feel like there's something really obvious here that I don't know about.


As I mentioned I've also tried returning the list that .readlines() generates, but that's going to get memory-intensive and I'm not sure how to fit that output into render.index() anyway.

Edit: The Solution

So the answer below works, but after I made that change I was still having the same issue. ShadowRanger's "I'm assuming your renderer is sending out HTML" got me thinking, and I opened up localhost and got into the web inspector to see that all the text was in quotation marks within its p tags, like so:

<p>
"MENENIUS: I have been they prayers of the reason, And away to friends than the state pointer; The words that shall can virtue to your head."
</p>

I came back to this after a few hours having realised something. In the index.html file the content was being sent to, it looked like this:

<p>
$content
</p>

I had a suspicion, checked the web.py intro tutorial again and found this:

As you can see, the templates look a lot like Python files except for the def with statement at the top (saying what the template gets called with) and the $s placed in front of any code. Currently, template.py requires the $def statement to be the first line of the file. Also, note that web.py automatically escapes any variables used here, so that if for some reason name is set to a value containing some HTML, it will get properly escaped and appear as plain text. If you want to turn this off, write $:name instead of $name.

I changed $content to $:content, and suddenly the text is being rendered as HTML rather than as a string.


Solution

  • Your file already contains newlines ('\x0a' is an escape for the exact same character that '\n' produces). I'm assuming your renderer is sending out HTML though, and HTML doesn't care about newlines in the text (outside of pre blocks, and other blocks styled similarly).

    So either wrap the data in a pre block, or replace the '\n' with <br> tags (which are how HTML says "No, really, I want a line break"), e.g.:

    from_text = from_text.replace("\n", "<br>\n")
    

    Leaving in the newlines may be handy to people viewing the source, so I replaced with both the <br> tag and a newline (Python won't replace in a replacement, so don't worry about infinite replacement just because the newline was part of the replacement).