Last year I had powershell (v3) script that parsed HTML of one festival page (and generate XML for my Windows Phone app).
I also was asking a question about it here and it worked like a charm.
But when I run the script this year, it is not working. To be specific - the method getElemntsByClassName is not returning anything. I tried that method also on other web pages with no luck.
Here is my code from last year, that is not working now:
$tmpFile_bandInfo = "C:\band.txt"
Write-Host "Stahuji kapelu $($kap.Nazev) ..." -NoNewline
Invoke-WebRequest http://www.colours.cz/ucinkujici/the-asteroids-galaxy-tour/ -OutFile $tmpFile_bandInfo
$content = gc $tmpFile_bandInfo -Encoding utf8 -raw
$ParsedHtml = New-Object -com "HTMLFILE"
$ParsedHtml.IHTMLDocument2_write($content)
$ParsedHtml.Close()
$bodyK = $ParsedHtml.body
$bodyK.getElementsByClassName("body four column page") # this returns NULL
$page = $page.item(0)
$aside = $page.getElementsByTagName("aside").item(0)
$img = $aside.getElementsByTagName("img").item(0)
$imgPath = $img.src
this is code I used to workaround this:
$sec = $bodyK.getElementsByTagName("section") | ? ClassName -eq "body four column page"
# but now I have no innerHTML, only the lonely tag SECTION
# so I am walking through siblings
$img = $sec.nextSibling.nextSibling.nextSibling.getElementsByTagName("img").item(0)
$imgPath = $img.src
This works, but this seems silly solution to me.
Anyone knows what I am doing wrong?
I actually solved this problem by abandoning Invoke-WebRequest
cmdlet and by adopting HtmlAgilityPack.
I transformed my former sequential HTML parsing into few XPath queries (everything stayed in powershell script). This solution is much more elegant and HtmlAgilityPack is real badass ;) It is really honour to work with project like this!