powershellweb-scrapinghtml-agility-pack

Web scraping from dynamic content table on Powershell using PowerHTML module


I'm getting an error when I try to read contents form a table on the web page described in the script. Can anyone please help me with a solution to fix it. Thanks.

@mklement0, Thanks for the detailed explanation. With your help, I was able to extract the table information. However, I'm still unable to extract table rows as it's still returned as null. Can you please help? Please see below. Thanks.
 
$wc = New-Object System.Net.WebClient
$res = $wc.DownloadString('https://datatables.net/examples/data_sources/ajax.html')
$html = ConvertFrom-Html -Content $res

$ScrapeData=[System.Collections.ArrayList]::new()
$ScrapeData+=$n
$table = $html.SelectNodes('//table') | Where-Object { $_.HasClass("display") -or $_.HasClass("dataTable")}

foreach ($row in $table.SelectNodes('//tr') | Where-Object { $_.HasClass("odd") -or $_.HasClass("even")} )
{
    $cnt += 1

    if ($cnt -eq 1) { continue }

    #$name= $row.SelectSingleNode('//th').innerText.Trim() | Where-Object { $_.HasClass('sorting_1')}
    $value=$row.SelectSingleNode('td').innerText.Trim() -replace "\?", " "
    $new_obj = New-Object -TypeName psobject
    $new_obj | Add-Member -MemberType NoteProperty -Value $value
    $ScrapeData+=$new_obj 
}

Write-Output 'Extracted Table Information'
$table
 
Write-Output 'Extracted Book Details Parsed from HTML table'
$ScrapeData

Extracted data as below


Solution


  • [1] In PowerShell's legacy, ships-with-Windows, Windows-only edition, Windows PowerShell (whose latest and last version is v5.1) - as opposed to the modern, cross-platform PowerShell (Core) 7+ edition - you may still be able to use built-in features to access dynamic content, but - given that these features rely on the long-obsolete Internet Explorer - this will work with fewer and fewer websites over time.