I've found this bash script to check status of URLs from text file and print the destination URL when having redirections :
#!/bin/bash
while read url
do
dt=$(date '+%H:%M:%S');
urlstatus=$(curl -kH 'Cache-Control: no-cache' -o /dev/null --silent --head --write-out '%{http_code} %{redirect_url}' "$url" )
echo "$url $urlstatus $dt" >> urlstatus.txt
done < $1
I'm not that good in bash : I'd like to add - for each url - the value of its Robots meta tag (if is exists)
Actually I'd really suggest a DOM parser (e.g. Nokogiri, hxselect, etc.),
but you can do this for instance (Handles lines starting with <meta
and "extracts" the value of the robots' attribute content):
curl -s "$url" | sed -n '/\<meta/s/\<meta[[:space:]][[:space:]]*name="*robots"*[[:space:]][[:space:]]*content="*\([^"]*\)"*\>/\1/p'
This will print the value of the attribute or the empty string if not available.
Do you need a pure Bash solution? Or do you have sed
?