pythonsubprocesslynx

Is it possible to decode HTML using lynx, in a Python script?


Let the html variable be a string containing the whole source code of a webpage, e.g.

html = "<!doctype html>\n<html><head><title>My title</title></head>LOTS OF CHARS HERE</html>"

I would like to print this web page in a human-readable format, using lynx if possible. I tried various things along the lines of

print(subprocess.run(['echo', html, '|', 'lynx', '-stdin', '-dump'], capture_output=True, text=True).stdout)

or

p1 = subprocess.Popen(["echo", html], stdout=subprocess.PIPE)
print(subprocess.run(['lynx', '-stdin', '-dump'], stdin=p1.stdout, capture_output=True, text=True).stdout)

but it fails with the following error

OSError: [Errno 7] Argument list too long: 'echo'

Any idea how to make it work?


Solution

  • There's no need for echo, use html as the input for lynx.

    print(subprocess.run(['lynx', '-stdin', '-dump'], input=html, capture_output=True, text=True).stdout)