powershellcsvweb-scraping

Webscrape table with powershell


My goal is webscrape this page and collect the data in the first table and export as a csv file:

https://www.interactivebrokers.co.uk/en/trading/margin-futures-fops.php

Margin Requirements Chicago Board Of Trade (CBOT):

$WebResponse = Invoke-WebRequest -Uri "https://www.interactivebrokers.co.uk/en/trading/margin-futures-fops.php"

$table = ($WebResponse.AllElements | Where {$_.class -match 'table-responsive'}).outerhtml

However, the query still returns A LOT of HTML. Is anyone able to shed some light on this to get me closer to my goal? Thanks in advance.


Solution

  • If you want to extract the data from an HTML table, I recommend taking advantage of existing tools others have shared. If you take a look at the powershell module PSParseHtml, it has a function called ConvertFrom-HtmlTable. This module uses other great tools to make this easy as pie for you and I. Thanks PrzemyslawKlys!

    We will install the module.

    Install-Module -Name PSParseHTML -Force
    

    Then we will extract the HTML with some basic regex. (Benefit of -UseBasicParsing is no longer depending on Internet Explorer components)

    $url = 'https://www.interactivebrokers.co.uk/en/trading/margin-futures-fops.php'
    
    $content = Invoke-WebRequest $url -UseBasicParsing
    
    $content -match '(?s)<div id="cbot.+?/div>' | Out-Null
    

    Finally, we will pass that selected bit of HTML to the function.

    $tabledata = ConvertFrom-HTMLTable -htmlString $matches.Values -all
    

    Now you have the table contents nicely parsed and stored in $tabledata. From there you can inspect it, export it, etc

    $tabledata | Format-Table
    $tabledata | Out-GridView
    $tabledata | Export-Csv -Path \path\to\file.csv -NoType
    

    enter image description here