I am trying to web scrape the Incentive Step Tracker table from the URL below. I am only interested in Small Residential Storage.
I got somewhere close but not exactly the full table. Please help finish my code and transform the result into CSV format so I can save to a local folder.
Here is my code:
# import libraries
from bs4 import BeautifulSoup
import urllib.request
import csv
urlpage='https://www.selfgenca.com/home/program_metrics/'
page = urllib.request.urlopen(urlpage)
# parse the html using beautiful soup and store in variable 'soup'
soup = BeautifulSoup(page, 'html.parser')
print(soup)
table = soup.find('table',{'class': 'table'}).find_all('tbody',{'data-t': 'Small Residential Storage'})[0]
results = table.find_all('tr')
print(results)
Here is the table I want to scrape:
I think it can be done with pandas, with these changes to your code above:
import pandas as pd
#get the headers
tab = soup.find('table',{'class': 'table'}).find_all('tr',{'class': 'head-row'})
headers=[]
for h in tab[0].find_all('td'):
headers.append(h.text)
and create a dataframe
final = []
for res in results:
tmp = []
for r in res:
if not 'NavigableString' in str(type(r)):
tmp.append(r.text.strip())
final.append(tmp)
df = pd.DataFrame(final,columns=headers)
df
Output looks like the table you want.