pythonweb-scrapingenergy

Web Scraping Incentive Table


I am trying to web scrape the Incentive Step Tracker table from the URL below. I am only interested in Small Residential Storage.

I got somewhere close but not exactly the full table. Please help finish my code and transform the result into CSV format so I can save to a local folder.

Here is my code:

# import libraries
from bs4 import BeautifulSoup
import urllib.request
import csv

urlpage='https://www.selfgenca.com/home/program_metrics/'

page = urllib.request.urlopen(urlpage)
# parse the html using beautiful soup and store in variable 'soup'
soup = BeautifulSoup(page, 'html.parser')
print(soup)

table = soup.find('table',{'class': 'table'}).find_all('tbody',{'data-t': 'Small Residential Storage'})[0]
results = table.find_all('tr')
print(results)

Here is the table I want to scrape:

Ideal Output Table


Solution

  • I think it can be done with pandas, with these changes to your code above:

    import pandas as pd
    
    #get the headers
    tab = soup.find('table',{'class': 'table'}).find_all('tr',{'class': 'head-row'})
    headers=[]
    for h in tab[0].find_all('td'):
       headers.append(h.text)
    

    and create a dataframe

    final = []
    for res in results:
        tmp = []
        for r in res:
            if not 'NavigableString' in str(type(r)):
                tmp.append(r.text.strip())
        final.append(tmp)
    
    df = pd.DataFrame(final,columns=headers)
    df
    

    Output looks like the table you want.