pythonloopsweb-scraping

How do I iterate through table rows in Python?


Image 1

How would I loop through HTML Table Rows in Python? Just to let y'all know, I am working on the website: https://schools.texastribune.org/districts/. What I'm trying to do is click each link in the table body (?) and extract the total number of students: Students

What I have so far:

response = requests.get("https://schools.texastribune.org/districts/")

soup = BeautifulSoup(response.text)

data = []

for a in soup.find_all('a', {'class': 'table table-striped'}):

    response = requests.get(a.get('href'))
    asoup = BeautifulSoup(response.text)
    data.append({
        'url': a.get('href'),
        'title': a.h2.get_text(strip=True),
        'content': asoup.article.get_text(strip=True)
    })

pd.DataFrame(data)

This is my first ever time web scraping something.


Solution

  • You should not have class_="td" when finding the <td> elements, they don't have any class.

    There's no <ul> elements in the table, so view = match.find('ul',class_="tr") won't find anything. You need to find the <a> element, gets its href, and load that to get the total students.

    d = {}
    for match in soup.find_all('td'):
        link = match.find("a")
        if link and link.href:
            school_page = requests.get("https://schools.texastribune.org" + link.href)
            school_soup = BeautifulSoup(school_page, 'lxml')
            total_div = school_soup.find("div", class_="metric", text="Total students"
            if total_div:
                amount = total_div.find("p", class_="metric-value")
                d[link.text] = amount.text
    
    print(d)