I am trying to convert websites into the HTML data structure given by blaze
.
curl -S http://jaspervdj.be/blaze | blaze-from-html
This example is taken from the end of the blaze-html
tutorial. Curl obviously works, but this library can't build of the HTML
html $ do
H.head $ H.title "301 Moved Permanently"
blaze-from-html: Attribute bgcolor is illegal in html5
Indeed, bgcolor
has been deprecated. How to I get blaze to run with HTML4?
curl -S http://jaspervdj.be/blaze | blaze-from-html -v html4-transitional
As suggested by the comments I used some transitional features and I get a 301. Does this page get redirected?
html $ do
H.head $ H.title "301 Moved Permanently"
body ! bgcolor "white" $ do
center $ h1 "301 Moved Permanently"
hr
center "nginx/1.2.1"
However, wget http://jaspervdj.be/blaze
returns the HTML content of the page.
This works for me:
curl -S http://jaspervdj.de/blaze | blaze-from-html -v html4-transitional
As suggested in the documentation you linked.
As for why one page is empty and says it's been redirected, it appears that curl sees a difference between http://jaspervdj.de/blaze
and http://jaspervdj.de/blaze/
, and the website you're downloading is erroneously treating them differently as well, while wget
seems to automatically redirect like my browser does. I would suggest contacting the website author and suggesting that he fix this behavior.