pythonbeautifulsouphtml-tableextract

extracting values from html table using beautifulsoup4 (2nd row onwards, 1st and 6th column)


I am new to python and need some guidance on extracting values from specific cells from a HTML table.

The URL that I am working on can be found here

I am looking to get the first 5 values only in the Month and Settlement columns and subsequently display them as:

"MAR 14:426'6"

Problem that I am facing is:

  1. How do I get the loop to start from the 3rd "TR" in the table
  2. How to get only values for td[0] and td[6].
  3. How to restrict the loop to only retrieve values for 5 rows

This is the code that I am working on:

tableData = soup1.find("table", id="DailySettlementTable")
for rows in tableData.findAll('tr'):
    month = rows.find('td')
    print month

Thank you and appreciate any form of guidance!


Solution

  • You probably want to use slicing.

    Here's a modified snippet for your code:

    table = soup.find('table', id='DailySettlementTable')
    
    # The slice notation below, [2:7], says to take the third (index 2)
    # to the eighth (index 7) values from the rows we get.
    for rows in table.find_all('tr')[2:7]:
        cells = rows.find_all('td')
        month = cells[0]
        settle = cells[6]
    
        print month.string + ':' + settle.string