Here are my codes:
`import pandas as pd
import requests
from bs4 import BeautifulSoup
url = "https://www.payscale.com/college-salary-report/majors-that-pay-you-back/bachelors/"
response = requests.get(url)
soup_job = response.text
soup = BeautifulSoup(soup_job, "html.parser")
table = soup.find_all('table', class_="data-table")
print(table)`
`Even when I do this, it is still not working. Is there anyone who can help me, please?'
page=1
while page <= 34:
response = requests.get(
f"https://www.payscale.com/college-salary-report/majors-that-pay-you-back/bachelors/page/{page}")
page += 1
I have used selenium to get that done, but it is still not working. I'm trying to see if someone can review and provide some hint on how to get that done.
import pandas as pd
import requests
from bs4 import BeautifulSoup
url = "https://www.payscale.com/college-salary-report/majors-that-pay-you-back/bachelors"
response = requests.get(url)
soup_job = response.text
soup = BeautifulSoup(soup_job, "html.parser")
table = soup.find_all('table', class_="data-table")
print(table)
When I print the table, I expected to get some text or data, but it gives me an empty list. I have changed the class name to see if I can get something better; it is showing me the same issue, and when I tried to use find instead of find all, it returned none instead of an empty list. I want to get the data inside the table there, and after that, I will be able to tract the head and the body of the table to get the data I want; nothing works for me so far.
I suspect that you are being blocked by the site. You need to be a little stealthy.
import time
import undetected_chromedriver as uc
import pandas as pd
from bs4 import BeautifulSoup
URL = "https://www.payscale.com/college-salary-report/majors-that-pay-you-back/bachelors/"
driver = uc.Chrome(headless=False, use_subprocess=True)
driver.get(URL)
# Wait for page to fully load. Could actually wait for a specific element.
time.sleep(5)
soup = BeautifulSoup(driver.page_source, "lxml")
data = []
for row in soup.select_one("table.data-table > tbody").select("tr"):
data.append({
"rank": row.select_one("td.csr-col--rank .data-table__value").text,
"school": row.select_one("td.csr-col--school-name .data-table__value").text,
"degree": row.select_one("td.csr-col--school-type .data-table__value").text,
"early": row.select_one("td:nth-of-type(4) .data-table__value").text,
"mid": row.select_one("td:nth-of-type(5) .data-table__value").text,
})
driver.quit()
data = pd.DataFrame(data)
print(data)
This is what the output should look like:
rank school degree early mid
0 1 Petroleum Engineering Bachelors $98,100 $212,100
1 2 Operations Research & Industrial Engineering Bachelors $101,200 $202,600
2 3 Electrical Engineering & Computer Science (EECS) Bachelors $128,500 $192,300
3 4 Interaction Design Bachelors $77,400 $178,800
4 5 Building Science Bachelors $71,100 $172,400
5 6 Applied Economics and Management Bachelors $81,200 $169,300
6 7 Actuarial Mathematics Bachelors $71,200 $167,500
7 8 Optical Science & Engineering Bachelors $81,500 $166,400
8 9 Quantitative Economics Bachelors $78,400 $165,100
9 10 Operations Research Bachelors $94,900 $164,900
10 11 Systems Engineering Bachelors $89,700 $163,800
11 12 Information & Computer Science Bachelors $73,200 $162,900
12 13 Public Accounting Bachelors $71,500 $162,200
13 14 Cognitive Science Bachelors $80,300 $162,100
14 15 Aeronautics & Astronautics Bachelors $89,800 $161,600
15 16 Aerospace Studies Bachelors $64,500 $158,400
16 17 Pharmacy Bachelors $71,500 $158,000
17 18 Managerial Economics Bachelors $78,200 $157,800
18 19 Foreign Affairs Bachelors $65,200 $157,700
19 20 Political Economy Bachelors $75,800 $156,700
20 21 Chemical Engineering Bachelors $87,700 $156,100
21 21 Marine Transportation Management Bachelors $78,500 $156,100
22 23 Computer Science (CS) & Engineering Bachelors $93,500 $154,100
23 24 Corporate Accounting & Finance Bachelors $79,100 $154,000
24 25 Computer Engineering (CE) Bachelors $92,000 $153,800