pythonstdinreadline

How can I read more than 4096 bytes from stdin, copy-pasted to a terminal on Linux?


I have this code:

import sys

binfile = "data.hex"
print("Paste ascii encoded data.")

line = sys.stdin.readline()
b = bytes.fromhex(line)

with open(binfile, "wb") as fp:
    fp.write(b)

Problem is that never more than 4096 bytes are read in the sys.stdin.readline() call. How can I make that buffer larger? I tried to supply a larger number to the call, but that had no effect.

Update I changed my stdin reading code to this:

line = ''
while True:
    b = sys.stdin.read(1)
    sys.stdin.flush()
    line += b

    if b == "\n":
        break

print(f"Read {len(line)} bytes")

Still run into that limit.


Solution

  • This truncate-long-lines-to-4096-bytes behavior is caused by the terminal (TTY) code in the Linux kernel. (Actually, as part of the truncation, the last byte of the 4096 bytes is also replaced with a newline byte.) By the time the (Python) process reads from the TTY as its stdin, the line has already been truncated. There is no easy fix for your use case, i.e. to prevent truncation when copy-pasting long lines to the terminal window. As a workaround, copy-paste to a file (e.g. infile.dat) instead, and then run python script.py <infile.dat.

    It's easy to reproduce the truncation behavior even without Python, by running dd bs=65536 of=/dev/null, copy-pasting a line longer than 4096 bytes, and then pressing Ctrl-D to indicate EOF. The last line of the output will start with 4096 bytes (4.1 kB, 4.0 KiB) copied,, indicating that only 4096 bytes were read. If you copy-paste multiple long lines, you'll see that each of them will get truncated to 4096 bytes (including the newline byte) separately.

    More analysis of this Linux kernel behavior:

    The byte count limit 4096 is hardcoded to the Linux kernel as N_TTY_BUF_SIZE.

    The rest of my answer demonstrates how Python and the shell (e.g. Bash) work without truncation, thus they are not causing the issue.


    This is to demonstrate that Python sys.stderr.readline() doesn't truncate, so there is no need to change your Python code.

    Python sys.stdin.readline() has an unlimited buffer (given that there is enough free memory). I've tried it with Python 2.7, 3.6 and newer Python on Linux.

    Here is what I've tried:


    In some cases it's useful to pass input as fast as possible (i.e. as soon as the process receives it) to the Python program, i.e. to prevent delay cause by buffering in sys.stdin.

    Good news: sys.stdin.readline() returns the next line as soon as it is available to the process, it doesn't wait for subsequent lines. In a loop, use for line in iter(sys.stdin.readline, ''):, and don't use for line in sys.stdin:, because the latter waits for more input even if a line is available. See https://stackoverflow.com/a/28919832/97248 and other answers for details.

    sys.stdin.read(n) typically has buffering delay: even if the process has already read n bytes, sys.stdin.read(n) will be waiting for more bytes until its buffer (of typically 8192 bytes) is filled. To avoid this delay in Python 3, use sys.stdin.buffer.raw.read(n) instead. This will read at most n bytes (not Unicode characters), and it returns as soon as at least 1 byte is available. Don't mix it with sys.stdin.readline(). In Python 2 and 3, use os.read(sys.stdin.fileno(), n) for this. Test the buffering delay using a pipe (e.g. cat | python ...), because without a pipe the system may use a terminal (TTY) device, which has line buffering by default, returning data earlier, at the end-of-line.


    This is to demonstrate that it's not the shell that causes truncation. Here is how:

    This demo has proved that truncation happens even if there is no shell (or Python process).