How can I read more than 4096 bytes from stdin, copy-pasted to a terminal on Linux?

I have this code:

import sys

binfile = "data.hex"
print("Paste ascii encoded data.")

line = sys.stdin.readline()
b = bytes.fromhex(line)

with open(binfile, "wb") as fp:
    fp.write(b)

Problem is that never more than 4096 bytes are read in the sys.stdin.readline() call. How can I make that buffer larger? I tried to supply a larger number to the call, but that had no effect.

Update I changed my stdin reading code to this:

line = ''
while True:
    b = sys.stdin.read(1)
    sys.stdin.flush()
    line += b

    if b == "\n":
        break

print(f"Read {len(line)} bytes")

Still run into that limit.

Solution

This truncate-long-lines-to-4096-bytes behavior is caused by the terminal (TTY) code in the Linux kernel. (Actually, as part of the truncation, the last byte of the 4096 bytes is also replaced with a newline byte.) By the time the (Python) process reads from the TTY as its stdin, the line has already been truncated. There is no easy fix for your use case, i.e. to prevent truncation when copy-pasting long lines to the terminal window. As a workaround, copy-paste to a file (e.g. infile.dat) instead, and then run python script.py <infile.dat.

It's easy to reproduce the truncation behavior even without Python, by running dd bs=65536 of=/dev/null, copy-pasting a line longer than 4096 bytes, and then pressing Ctrl-D to indicate EOF. The last line of the output will start with 4096 bytes (4.1 kB, 4.0 KiB) copied,, indicating that only 4096 bytes were read. If you copy-paste multiple long lines, you'll see that each of them will get truncated to 4096 bytes (including the newline byte) separately.

More analysis of this Linux kernel behavior:

Linux terminal input: reading user input from terminal truncating lines at 4095 character limit
Is there any limit on line length when pasting to a terminal in Linux?
Both answers above explain how the Linux TTY line editor can be disabled with stty -icanon, and this will prevent the truncation. See more details on how to do it in the answers. However, please don't do it in random programs, because it changes other terminal behavior as well (such as detecting Ctrl-C and disabling input echo), and this will confuse your users.

The byte count limit 4096 is hardcoded to the Linux kernel as N_TTY_BUF_SIZE.

The rest of my answer demonstrates how Python and the shell (e.g. Bash) work without truncation, thus they are not causing the issue.

This is to demonstrate that Python sys.stderr.readline() doesn't truncate, so there is no need to change your Python code.

Python sys.stdin.readline() has an unlimited buffer (given that there is enough free memory). I've tried it with Python 2.7, 3.6 and newer Python on Linux.

Here is what I've tried:

Reading short lines from a pipe immediately (without additional buffering delay in Python):
```
$ (echo -n A; sleep .3; echo a; sleep .3; echo B; sleep .3) | python -c "if 1:
  for line in iter(__import__('sys').stdin.readline, ''): print([line])"
['Aa\n']
['B\n']
```
To try it, run the command without the leading $. It works on Linux for me, I think it will work on macOS, Windows and other systems. On Windows, you may want to drop the if 1: and the line breaks.
Reading short lines as bytes (rather than Unicode characters) from a pipe immediately, in Python 3.x, using sys.stdin.buffer.readline():
```
$ (echo -n A; sleep .3; echo a; sleep .3; echo B; sleep .3) | python -c "if 1:
  for line in iter(__import__('sys').stdin.buffer.readline, b''): print([line])"
[b'Aa\n']
[b'B\n']
```
To try it, run the command without the leading $. It works on Linux for me, I think it will work on macOS, Windows and other systems. On Windows, you may want to drop the if 1: and the line breaks.

Reading a long (longer than 10 MiB) line immediately, without truncation from a pipe:

$ python -c "if 1:
      import sys, time; f = sys.stdout
      f.write('A' * 10987654); f.flush(); time.sleep(.3)
      f.write('aaa\n'); f.flush(); time.sleep(.3)
      f.write('B\n'); f.flush(); time.sleep(.3)" |
  python -c "if 1:
      for line in iter(__import__('sys').stdin.readline, ''): print(len(line))"
10987658
2

To try it, run the command without the leading $. It works on Linux for me, I think it will work on macOS, Windows and other systems. On Windows, put the Python code to files a.py and b.py, and then run python a.py | python b.py.

In some cases it's useful to pass input as fast as possible (i.e. as soon as the process receives it) to the Python program, i.e. to prevent delay cause by buffering in sys.stdin.

Good news: sys.stdin.readline() returns the next line as soon as it is available to the process, it doesn't wait for subsequent lines. In a loop, use for line in iter(sys.stdin.readline, ''):, and don't use for line in sys.stdin:, because the latter waits for more input even if a line is available. See https://stackoverflow.com/a/28919832/97248 and other answers for details.

sys.stdin.read(n) typically has buffering delay: even if the process has already read n bytes, sys.stdin.read(n) will be waiting for more bytes until its buffer (of typically 8192 bytes) is filled. To avoid this delay in Python 3, use sys.stdin.buffer.raw.read(n) instead. This will read at most n bytes (not Unicode characters), and it returns as soon as at least 1 byte is available. Don't mix it with sys.stdin.readline(). In Python 2 and 3, use os.read(sys.stdin.fileno(), n) for this. Test the buffering delay using a pipe (e.g. cat | python ...), because without a pipe the system may use a terminal (TTY) device, which has line buffering by default, returning data earlier, at the end-of-line.

This is to demonstrate that it's not the shell that causes truncation. Here is how:

On your regular Linux GUI, run any of these commands (without the leading $):

$ xterm -e dd bs=1 status=progress
$ konsole -e dd bs=1 status=progress
$ gnome-terminal -- dd bs=1 status=progress

A new, empty terminal window will appear.

In anther program, copy a line longer than 4096 bytes to the clipboard (alternatively, you may copy a text containing multiple lines, some longer, some shorter).
Paste it to the new, empty terminal window. If unsure, press Shift-Insert to paste. If that doesn't work, use Ctrl-Shift-V to paste. If that doesn't work either, use Edit / Paste in the menu.
Wait a second, press Enter in the window. dd will display something like ... bytes (...) copied, .... The byte count will be smaller than expected, indicating line truncation at 4096 bytes.
You can close the window now. Or just press Ctrl-C or Ctrl-D to exit from dd, causing the window to close.

This demo has proved that truncation happens even if there is no shell (or Python process).