bashhtmlsh

Getting Webpage Title, Img, Metadata info from Linux Terminal


is there any way or any tools that I could use to get from a SH script, a webpage title, metadata such as descriptions, maybe a little screenshot of the webpage or anything like that?

Thanks in advance!


Solution

  • you could use curl or wget to get the webpage, and then pipe it to sed to get the contents of various tags. It's kludgy as, but that's kinda what you're going to get if you're doing this stuff with a shell script.

    eg

    wget http://example.com -O - | grep \<title\>|sed "s/\<title\>\([^<]*\).*/\1/"
    

    will give you the contents of the title tag. Note that in this example it gives you the raw un-parsed source, so it looks like IANA &mdash; Example domains instead of IANA – Example domains.

    Have you considered using something like perl?