I have a Python script using MechanicalSoup StatefulBrowser to open URL that used to work. But it stopped working recently opening a specific website, and I haven't changed any code.
I tried opening other websites, and it's fine. This is the specific website that fails to open: http://a810-bisweb.nyc.gov/bisweb/ComplaintsByAddressServlet?allbin=4606689
import mechanicalsoup
browser = mechanicalsoup.StatefulBrowser()
# open url test
url = "http://www.cnn.com"
print("opening website: {}".format(url))
browser.open(url)
print("done website: {}".format(url))
url = "http://a810-bisweb.nyc.gov/bisweb/ComplaintsByAddressServlet?allbin=4606689"
print("opening website: {}".format(url))
browser.open(url)
print("done website: {}".format(url))
The following is the output I got is from www.cnn.com which opened up as expected. But the 2nd link just hangs.
Any help? Or if anyone know of a way to contact MechanicalSoup developer, please let me know.
Output:
opening website: http://www.cnn.com
done website: http://www.cnn.com
opening website: http://a810-bisweb.nyc.gov/bisweb/ComplaintsByAddressServlet?allbin=4606689
... hangs ...
Thank you.
Many portals block connection if it has wrong header "User-Agent" which inform server what web browser is used to connect.
Python's tools (like requests
) often use word Python
in User-Agent
so server can recognize that it is not real web browser and block connection.
If I use text "Mozilla/5.0"
as User-Agent
then I can connect again
browser = mechanicalsoup.StatefulBrowser()
browser.set_user_agent('Mozilla/5.0')
Text "Mozilla/5.0"
is not full text used by read web browser so you could find better text. Or it should be python's module with User-Agent from different web browsers so you can use different values in different days.