pythonmechanicalsoup

MechanicalSoup StatefulBrowser: Unable to Open URL


I have a Python script using MechanicalSoup StatefulBrowser to open URL that used to work. But it stopped working recently opening a specific website, and I haven't changed any code.

I tried opening other websites, and it's fine. This is the specific website that fails to open: http://a810-bisweb.nyc.gov/bisweb/ComplaintsByAddressServlet?allbin=4606689

import mechanicalsoup

browser = mechanicalsoup.StatefulBrowser()

# open url test
url = "http://www.cnn.com"
print("opening website: {}".format(url))
browser.open(url)
print("done website: {}".format(url))

url = "http://a810-bisweb.nyc.gov/bisweb/ComplaintsByAddressServlet?allbin=4606689"
print("opening website: {}".format(url))
browser.open(url)
print("done website: {}".format(url))

The following is the output I got is from www.cnn.com which opened up as expected. But the 2nd link just hangs.

Any help? Or if anyone know of a way to contact MechanicalSoup developer, please let me know.

Output:

opening website: http://www.cnn.com
done website: http://www.cnn.com
opening website: http://a810-bisweb.nyc.gov/bisweb/ComplaintsByAddressServlet?allbin=4606689
... hangs ...

Thank you.


Solution

  • Many portals block connection if it has wrong header "User-Agent" which inform server what web browser is used to connect.

    Python's tools (like requests) often use word Python in User-Agent so server can recognize that it is not real web browser and block connection.

    If I use text "Mozilla/5.0" as User-Agent then I can connect again

    browser = mechanicalsoup.StatefulBrowser()
    browser.set_user_agent('Mozilla/5.0')
    

    Text "Mozilla/5.0" is not full text used by read web browser so you could find better text. Or it should be python's module with User-Agent from different web browsers so you can use different values in different days.