I'm trying to get the contents of an HTML table using XPaths, I'm using Mechanicalsoup to grab the form and submit it (The data is behind a submission form) once I hit the second form I grab the URL and pass it for parsing but I'm getting AttributeError: 'list' object has no attribute 'xpath'
import mechanicalsoup
import requests
from lxml import html
from lxml import etree
#This Will Use Mechanical Soup to grab the Form, Subit it and find the Data Table
browser = mechanicalsoup.StatefulBrowser()
winnet = "http://winnet.wartburg.edu/coursefinder/"
browser.open(winnet)
Searchform = browser.select_form()
Searchform.choose_submit('ctl00$ContentPlaceHolder1$FormView1$Button_FindNow')
response1 = browser.submit_selected() #This Progresses to Second Form
dataURL = 'https://winnet.wartburg.edu/coursefinder/Results.aspx' #Get URL of Second Form w/ Data
pageContent=requests.get(dataURL)
tree = html.fromstring(pageContent.content)
dataTable = tree.xpath('//*[@id="ctl00_ContentPlaceHolder1_GridView1"]')
print(dataTable)
for row in dataTable.xpath(".//tr")[1:]:
print([cell.text_content() for cell in row.xpath(".//td")])
#XPath to Table
#//*[@id="ctl00_ContentPlaceHolder1_GridView1"]
I'd post the HTML I'm trying to parse but it is incredibly long and from what I've seen of some other sites I've worked with it is incredibly sloppily written
I'm not sure, but I believe you are after something like this. If that's not it, you can probably modify it to get you where you want to be.
import pandas as pd
rows = [] #initialize a collection of rows
for row in dataTable[0].xpath(".//tr")[1:]: #add new rows to the collection
rows.append([cell.text_content().strip() for cell in row.xpath(".//td")])
df = pd.DataFrame(rows) #load the collection to a dataframe
df
Output (pardon the formatting):
View Details AC 121 01 Principles of Accounting I Pilcher, A M W F 10:45AM-11:50AM 45/40/0 WBC 116 2019-20 WI 1.00
View Details AC 122 01 Principles of Accounting II Pilcher, A MWF 12:00PM-1:05PM 45/42/0 WBC 116 2019-20 WI 1.00
etc.