Is there a way I can do something like the following using the standard linux toolchain?
Let's say the source at example.com/index.php is:
Hello, & world! "
How can I do something like this...
curl -s http://example.com/index.php | htmlentities
...that would print the following:
Hello, & world! "
Using only the standard linux toolchain?
Use recode
.
$ echo 'Hello, & world! "' | recode HTML_4.0
Hello, & world! "
EDIT: By the way, recode
offers several different conversions corresponding to different versions of HTML and XML, so you can use e.g. HTML_3.2
instead of HTML_4.0
if you have a really old HTML document. Running recode -l
will list all the complete list of charsets supported by the program.