pythonselenium-webdriverweb-scrapingbeautifulsoup

I'M trying to scrape the website payscale.com to get some data there using BeautifulSoup, but i can't manage to get it no matter what i did


Here are my codes:

`import pandas as pd
import requests
from bs4 import BeautifulSoup


url = "https://www.payscale.com/college-salary-report/majors-that-pay-you-back/bachelors/"
response = requests.get(url)
soup_job = response.text
soup = BeautifulSoup(soup_job, "html.parser")
table = soup.find_all('table', class_="data-table")
print(table)`

`Even when I do this, it is still not working. Is there anyone who can help me, please?'

page=1
while page <= 34:
 
    response = requests.get(
        f"https://www.payscale.com/college-salary-report/majors-that-pay-you-back/bachelors/page/{page}")
    page += 1

I have used selenium to get that done, but it is still not working. I'm trying to see if someone can review and provide some hint on how to get that done.

import pandas as pd
import requests
from bs4 import BeautifulSoup

url = "https://www.payscale.com/college-salary-report/majors-that-pay-you-back/bachelors"
response = requests.get(url)
soup_job = response.text
soup = BeautifulSoup(soup_job, "html.parser")
table = soup.find_all('table', class_="data-table")
print(table)

When I print the table, I expected to get some text or data, but it gives me an empty list. I have changed the class name to see if I can get something better; it is showing me the same issue, and when I tried to use find instead of find all, it returned none instead of an empty list. I want to get the data inside the table there, and after that, I will be able to tract the head and the body of the table to get the data I want; nothing works for me so far.


Solution

  • I suspect that you are being blocked by the site. You need to be a little stealthy.

    import time
    import undetected_chromedriver as uc
    import pandas as pd
    from bs4 import BeautifulSoup
    
    
    URL = "https://www.payscale.com/college-salary-report/majors-that-pay-you-back/bachelors/"
    
    driver = uc.Chrome(headless=False, use_subprocess=True)
    
    driver.get(URL)
    
    # Wait for page to fully load. Could actually wait for a specific element.
    time.sleep(5)
    
    soup = BeautifulSoup(driver.page_source, "lxml")
    
    data = []
    
    for row in soup.select_one("table.data-table > tbody").select("tr"):
        data.append({
            "rank": row.select_one("td.csr-col--rank .data-table__value").text,
            "school": row.select_one("td.csr-col--school-name .data-table__value").text,
            "degree": row.select_one("td.csr-col--school-type .data-table__value").text,
            "early": row.select_one("td:nth-of-type(4) .data-table__value").text,
            "mid": row.select_one("td:nth-of-type(5) .data-table__value").text,
        })
    
    driver.quit()
    
    data = pd.DataFrame(data)
    print(data)
    

    This is what the output should look like:

       rank                                            school     degree     early       mid
    0     1                             Petroleum Engineering  Bachelors   $98,100  $212,100
    1     2      Operations Research & Industrial Engineering  Bachelors  $101,200  $202,600
    2     3  Electrical Engineering & Computer Science (EECS)  Bachelors  $128,500  $192,300
    3     4                                Interaction Design  Bachelors   $77,400  $178,800
    4     5                                  Building Science  Bachelors   $71,100  $172,400
    5     6                  Applied Economics and Management  Bachelors   $81,200  $169,300
    6     7                             Actuarial Mathematics  Bachelors   $71,200  $167,500
    7     8                     Optical Science & Engineering  Bachelors   $81,500  $166,400
    8     9                            Quantitative Economics  Bachelors   $78,400  $165,100
    9    10                               Operations Research  Bachelors   $94,900  $164,900
    10   11                               Systems Engineering  Bachelors   $89,700  $163,800
    11   12                    Information & Computer Science  Bachelors   $73,200  $162,900
    12   13                                 Public Accounting  Bachelors   $71,500  $162,200
    13   14                                 Cognitive Science  Bachelors   $80,300  $162,100
    14   15                        Aeronautics & Astronautics  Bachelors   $89,800  $161,600
    15   16                                 Aerospace Studies  Bachelors   $64,500  $158,400
    16   17                                          Pharmacy  Bachelors   $71,500  $158,000
    17   18                              Managerial Economics  Bachelors   $78,200  $157,800
    18   19                                   Foreign Affairs  Bachelors   $65,200  $157,700
    19   20                                 Political Economy  Bachelors   $75,800  $156,700
    20   21                              Chemical Engineering  Bachelors   $87,700  $156,100
    21   21                  Marine Transportation Management  Bachelors   $78,500  $156,100
    22   23               Computer Science (CS) & Engineering  Bachelors   $93,500  $154,100
    23   24                    Corporate Accounting & Finance  Bachelors   $79,100  $154,000
    24   25                         Computer Engineering (CE)  Bachelors   $92,000  $153,800