[SOLVED] Webscrape table with powershell

Webscrape table with powershell

My goal is webscrape this page and collect the data in the first table and export as a csv file:

https://www.interactivebrokers.co.uk/en/trading/margin-futures-fops.php

Margin Requirements Chicago Board Of Trade (CBOT):

$WebResponse = Invoke-WebRequest -Uri "https://www.interactivebrokers.co.uk/en/trading/margin-futures-fops.php"

$table = ($WebResponse.AllElements | Where {$_.class -match 'table-responsive'}).outerhtml

However, the query still returns A LOT of HTML. Is anyone able to shed some light on this to get me closer to my goal? Thanks in advance.

Solution

If you want to extract the data from an HTML table, I recommend taking advantage of existing tools others have shared. If you take a look at the powershell module PSParseHtml, it has a function called ConvertFrom-HtmlTable. This module uses other great tools to make this easy as pie for you and I. Thanks PrzemyslawKlys!

We will install the module.

Install-Module -Name PSParseHTML -Force

Then we will extract the HTML with some basic regex. (Benefit of -UseBasicParsing is no longer depending on Internet Explorer components)

$url = 'https://www.interactivebrokers.co.uk/en/trading/margin-futures-fops.php'

$content = Invoke-WebRequest $url -UseBasicParsing

$content -match '(?s)<div id="cbot.+?/div>' | Out-Null

Finally, we will pass that selected bit of HTML to the function.

$tabledata = ConvertFrom-HTMLTable -htmlString $matches.Values -all

Now you have the table contents nicely parsed and stored in $tabledata. From there you can inspect it, export it, etc

$tabledata | Format-Table
$tabledata | Out-GridView
$tabledata | Export-Csv -Path \path\to\file.csv -NoType