There's a HTML file which is exported to a variable using 'Invoke-WebRequest' from where I'd like to export content from a specific table.
$Result = Invoke-WebRequest 'https://www.dailyfaceoff.com/teams/anaheim-ducks/line-combinations'
Unfortunately by using $result.parsedHTML is not returning any result. Hence I was looking at using regex to find the string. This is where I am looking for your help.
Requested actions:
HTML Structure:
<body ...>
<div ...>
<tbody>
<td id="LW1">
<a ....>
<span class="player-name">Hello World</span>
</a>
</td>
</tbody>
</div>
</body>
Thanks in advance for any input or help!
Try 1:
$r = Invoke-WebRequest 'https://www.dailyfaceoff.com/teams/anaheim-ducks/line-combinations'
$table = $r.ParsedHtml.getElementsByTagName("table")
Result 1: No output, looks like HTML structure is preventing parsing action.
Try 2:
$r = Invoke-WebRequest 'https://www.dailyfaceoff.com/teams/anaheim-ducks/line-combinations'
$string = ($r.Content |
where {$_ -match '^a href.*LW1.*\ title=.*>/span.*'}) -replace '.*>'
Result 2: Regex not matching
Please don't try to parse HTML with regex, that's a terrible idea. You can do this in both, PowerShell Core and Windows PowerShell using Com Object:
$com = New-Object -ComObject htmlfile
$com.write([System.Text.Encoding]::Unicode.GetBytes(@'
<body>
<div>
<tbody>
<td id="LW1">
<a><span class="player-name">Hello World</span></a>
</td>
</tbody>
</div>
</body>
'@))
$com.getElementsByClassName('player-name') | ForEach-Object innerHtml
# Outputs: Hello World
$null = [System.Runtime.InteropServices.Marshal]::ReleaseComObject($com)
Alternatively, you can use XmlDocument
:
$xml = [xml]::new()
$xml.LoadXml(@'
<body>
<div>
<tbody>
<td id="LW1">
<a><span class="player-name">Hello World</span></a>
</td>
</tbody>
</div>
</body>
'@)
$xml.SelectSingleNode("//span[@class='player-name']").InnerText