My goal is webscrape this page and collect the data in the first table and export as a csv file:
https://www.interactivebrokers.co.uk/en/trading/margin-futures-fops.php
Margin Requirements Chicago Board Of Trade (CBOT):
$WebResponse = Invoke-WebRequest -Uri "https://www.interactivebrokers.co.uk/en/trading/margin-futures-fops.php"
$table = ($WebResponse.AllElements | Where {$_.class -match 'table-responsive'}).outerhtml
However, the query still returns A LOT of HTML. Is anyone able to shed some light on this to get me closer to my goal? Thanks in advance.
If you want to extract the data from an HTML table, I recommend taking advantage of existing tools others have shared. If you take a look at the powershell module PSParseHtml, it has a function called ConvertFrom-HtmlTable
. This module uses other great tools to make this easy as pie for you and I. Thanks PrzemyslawKlys!
We will install the module.
Install-Module -Name PSParseHTML -Force
Then we will extract the HTML with some basic regex. (Benefit of -UseBasicParsing
is no longer depending on Internet Explorer components)
$url = 'https://www.interactivebrokers.co.uk/en/trading/margin-futures-fops.php'
$content = Invoke-WebRequest $url -UseBasicParsing
$content -match '(?s)<div id="cbot.+?/div>' | Out-Null
Finally, we will pass that selected bit of HTML to the function.
$tabledata = ConvertFrom-HTMLTable -htmlString $matches.Values -all
Now you have the table contents nicely parsed and stored in $tabledata
. From there you can inspect it, export it, etc
$tabledata | Format-Table
$tabledata | Out-GridView
$tabledata | Export-Csv -Path \path\to\file.csv -NoType