linuxhtml-entitieslinux-toolchain

Find/Replace htmlentities using the standard linux toolchain?


Is there a way I can do something like the following using the standard linux toolchain?

Let's say the source at example.com/index.php is:

Hello, & world! "

How can I do something like this...

curl -s http://example.com/index.php | htmlentities

...that would print the following:

Hello, & world! "

Using only the standard linux toolchain?


Solution

  • Use recode.

    $ echo 'Hello, & world! "' | recode HTML_4.0
    Hello, & world! "
    

    EDIT: By the way, recode offers several different conversions corresponding to different versions of HTML and XML, so you can use e.g. HTML_3.2 instead of HTML_4.0 if you have a really old HTML document. Running recode -l will list all the complete list of charsets supported by the program.