I am learning BeautifulSoup library in python and came across urllib library to understand more on HTTP request-response cycle.
In the following code, I'm trying to scrape all the anchor tags which are there on that HTML page and but getting an error: NameError: name 'html' is not defined
I tried to solve the problem using google and found the following relevant StackOverflow question: https://stackoverflow.com/questions/36113811/name-error-html-not-defined-with-beautifulsoup4].
I tried the given solution but it couldn't work.
import urllib
from bs4 import BeautifulSoup
url=input('Enter- ')
req_file=urllib.request.urlopen(url).read()
soup=BeautifulSoup(html,"html.parser")
tags=soup('a')
for tag in tags:
print(tag.get('href',None))
You're storing the read as variable reg_file
:
req_file=urllib.request.urlopen(url).read()
but when you try to pass it off to BeautifulSoup, it's looking for variable html
, which hasn't been defined as anything, hence the 'html' is not defined
error
soup=BeautifulSoup(html,"html.parser")
so the option is to either, store the request
.read()
as the variable html
:
html=urllib.request.urlopen(url).read()
soup=BeautifulSoup(html,"html.parser")
or pass what you have originally stored, req_file
to BeautifulSoup:
req_file=urllib.request.urlopen(url).read()
soup=BeautifulSoup(req_file,"html.parser")
hope the explanation helps. I'm still learning BeautifulSoup, but can remember all the struggles at the beginning. It's fun once you get the hang of it a bit.
import urllib
from bs4 import BeautifulSoup
url=input('Enter- ')
req_file=urllib.request.urlopen(url).read()
soup=BeautifulSoup(req_file,"html.parser")
tags=soup('a')
for tag in tags:
print(tag.get('href',None))